[JS] Decoding Dictionary<Uint32, Utf8>
incorrectly.
#46100
-
Hey everyone! 👋 We have a problem with decoding a We want this array: [
"gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz"
"gke-europe-west3-0-preemptible-t2d-st-add19435-w74v"
"gke-europe-west3-0-preemptible-t2d-st-717888db-nrfr"
] but we get this: [
"gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz",
"gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz",
null
] It seems like we're reading the same string twice, and then in the second batch, we read ReproducingFor us, there's a stream with incoming Arrow Flight Data chunks. The Flight Data is transformed into chunks using similar code to this: The chunks are base64 encoded for reproducibility in the unit test. import {tableFromIPC,StructRow} from "apache-arrow";
describe('Arrow Reader', () => {
const chunksDistinct = [
'3AAAABAAAAAAAAoADAAKAAkABAAKAAAAEAAAAAABBAAIAAgAAAAEAAgAAAAEAAAAAQAAABQAAAAQABQAEAAOAA8ABAAAAAgAEAAAABgAAAAMAAAAAAABDXAAAAABAAAAGAAAALD///8QABgAFAAOAA8ABAAQAAgAEAAAADwAAAAwAAAAAAABBRAAAAAwAAAACAAKAAAABAAIAAAADAAAAAAABgAIAAQABgAAACAAAAAAAAAABAAEAAQAAAAEAAAAbm9kZQAAAAATAAAAYXR0cmlidXRlc19yZXNvdXJjZQA=',
'qAAAABAAAAAMABgAFgAVAAQACAAMAAAAHAAAAMAAAAAAAAAAAAAAAAACBAAIAAoAAAAEAAgAAAAQAAAAAAAKABgADAAIAAQACgAAACwAAAAQAAAAAQAAAAAAAAAAAAAAAQAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAQAAAAAAAABAAAAAAAAAAAgAAAAAAAAAgAAAAAAAAAAzAAAAAAAAAP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAZ2tlLWV1cm9wZS13ZXN0My0wLXByZWVtcHRpYmxlLXQyZC1zdC1lYzI3ZDNkYi1wd3d6AAAAAAAAAAAAAAAAAA==',
'qAAAABAAAAAMABoAGAAXAAQACAAMAAAAIAAAAMAAAAAAAAAAAAAAAAAAAAMEAAoAGAAMAAgABAAKAAAAPAAAABAAAAABAAAAAAAAAAAAAAACAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAQAAAAAAAABAAAAAAAAAAAEAAAAAAAAAgAAAAAAAAAAEAAAAAAAAAP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==',
'qAAAABAAAAAMABgAFgAVAAQACAAMAAAAHAAAAAABAAAAAAAAAAAAAAACBAAIAAoAAAAEAAgAAAAQAAAAAAAKABgADAAIAAQACgAAACwAAAAQAAAAAgAAAAAAAAAAAAAAAQAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAQAAAAAAAABAAAAAAAAAAAwAAAAAAAAAgAAAAAAAAABmAAAAAAAAAP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMwAAAGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAZ2tlLWV1cm9wZS13ZXN0My0wLXByZWVtcHRpYmxlLXQyZC1zdC1hZGQxOTQzNS13NzR2Z2tlLWV1cm9wZS13ZXN0My0wLXByZWVtcHRpYmxlLXQyZC1zdC03MTc4ODhkYi1ucmZyAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=',
'qAAAABAAAAAMABoAGAAXAAQACAAMAAAAIAAAAMAAAAAAAAAAAAAAAAAAAAMEAAoAGAAMAAgABAAKAAAAPAAAABAAAAACAAAAAAAAAAAAAAACAAAAAgAAAAAAAAAAAAAAAAAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAQAAAAAAAABAAAAAAAAAAAEAAAAAAAAAgAAAAAAAAAAIAAAAAAAAAP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==',
];
describe('decode SELECT DISTINCT', () => {
const chunks: Buffer[] = [];
const chunksString: string[] = [];
chunksDistinct.forEach((b64) => {
const decoded = Buffer.from(b64, 'base64');
chunks.push(decoded);
// for debugging
chunksString.push(new TextDecoder().decode(decoded));
});
console.log(chunksString);
// Using RecordBatchReader makes no difference
// const reader = RecordBatchReader.from(chunks)
// for (const batch of reader) {
// console.log(batch.length);
//
// }
const table = tableFromIPC(chunks);
it('should have the correct data', () => {
expect(table.numRows).toBe(3);
expect(table.numCols).toBe(1);
expect(table.schema.fields[0].toString()).toBe('attributes_resource: Struct<{node:Dictionary<Uint32, Utf8>}>');
const nodes = [];
table.toArray().forEach((row: StructRow) => {
nodes.push(row['attributes_resource']['node']);
})
expect(nodes).toContain("gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz",);
expect(nodes).toContain("gke-europe-west3-0-preemptible-t2d-st-add19435-w74v",);
expect(nodes).toContain("gke-europe-west3-0-preemptible-t2d-st-717888db-nrfr",);
});
});
}); Using my debugger, I can introspect these chunks. The expected strings are inside these chunks. Just a thought: Maybe we should ignore chunks at index 2 and 4? The unit test fails:
I hope the code above is able to run on your machine. Does arrow-js have a Slack or Discord? I wasn't able to find anything. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
It turns out we're using https://arrow.apache.org/rust/arrow_flight/encode/enum.DictionaryHandling.html#variants |
Beta Was this translation helpful? Give feedback.
It turns out we're using
DictionaryHandling::Resend
in our backend.Using
DictionaryHandling::Hydrate
, things are starting to get decoded just fine.https://arrow.apache.org/rust/arrow_flight/encode/enum.DictionaryHandling.html#variants