[C++] Footer dumps for IPC Feather/ Parquet #46505
jayspomodoro
started this conversation in
Ideas
Replies: 2 comments 6 replies
-
Hi @jayspomodoro, that sounds like an interesting approach. Do you have a demo you could share here? It would help make it more clear what you're end-to-end workflow is and might save others time evaluating. |
Beta Was this translation helpful? Give feedback.
4 replies
-
Can we use the Apache Arrow IPC Streaming Format https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format instead of Apache Arrow IPC File Format https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format / Apache Parquet for the use case? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have observed in many discussions the inability to read feather/parquet files unless they are complete. Or to be more precise when the file closes, no further writing is possible. This is due to the fact that the reader requires the footer containing metadata to decipher the contents.
Some context - my own requirement for this feature was to continuously store the new data from some process (say a time-series data) in a memory efficient plus reader friendly manner, which can be simultaneously read while being written. Row-based storing (like csv and avro) do not really help a lot - I am talking about huge files here. Column-based storage has the potential for micro-batch processing (not exactly streaming) but has the problem of not being able to read at the same time.
My solution - at least a part of it - has been to make changes in exposed API to enable dumping of footer in a different file. Wanted to be a part of the ongoing development by sharing this knowledge. Let me know if this will be useful to anyone!
Beta Was this translation helpful? Give feedback.
All reactions