-
Notifications
You must be signed in to change notification settings - Fork 488
refactor: remove pyarrow dependency #3459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: remove pyarrow dependency #3459
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
00cd58d
to
2175e3c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3459 +/- ##
==========================================
- Coverage 71.72% 71.64% -0.08%
==========================================
Files 152 152
Lines 46577 46619 +42
Branches 46577 46619 +42
==========================================
- Hits 33407 33402 -5
- Misses 11024 11067 +43
- Partials 2146 2150 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3277a9d
to
76590d3
Compare
Signed-off-by: Ion Koutsouris <[email protected]>
76590d3
to
6f52fa6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!! This must have been sofe "fun" work rewriting all this stuff :). But well worth it!
Just left a few comments to look at.
fn __arrow_c_schema__<'py>(&self, py: Python<'py>) -> PyArrowResult<Bound<'py, PyCapsule>> { | ||
let inner_type = DataType::Array(Box::new(self.inner_type.clone())); | ||
let arrow_type: ArrowDataType = (&inner_type) | ||
.try_into() | ||
.map_err(|err: ArrowError| PyException::new_err(err.to_string()))?; | ||
|
||
to_schema_pycapsule(py, arrow_type) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHould we maybe add a todo / issue to try and make this "only-once" copy with a OnceCell
o.a. I think the capsule interface at least strongly encourages zero copy semantics. While thats harto to achieve, we can maybe get closer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
ba6d930
to
ed72d67
Compare
.github/workflows/python_build.yml
Outdated
name: Python Build (Python 3.10 PyArrow latest) | ||
name: Python Build (Python 3.10 Optional latest pyarrow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ion-elgreco - maybe we need to revert this?
ed72d67
to
9ea1a4e
Compare
Signed-off-by: Ion Koutsouris <[email protected]>
9ea1a4e
to
e1865b9
Compare
Description
This refactor makes pyarrow optional, and only used for the read paths. All data/schema interfaces are using arrow c data interfaces and return Arro3 RecordBatches or RecordBatchReaders (backed by arrow-rs <3). Even our own DeltaSchema can return PyCapsules now. Schema conversion code is also done with arro3, and the casting is done in Rust with a RecordBatchReader wrapper.
Related Issue(s)
Documentation
I will add this in a followup in the migration guide :)