Skip to content

refactor: remove pyarrow dependency #3459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 24, 2025

Conversation

ion-elgreco
Copy link
Collaborator

@ion-elgreco ion-elgreco commented May 24, 2025

Description

This refactor makes pyarrow optional, and only used for the read paths. All data/schema interfaces are using arrow c data interfaces and return Arro3 RecordBatches or RecordBatchReaders (backed by arrow-rs <3). Even our own DeltaSchema can return PyCapsules now. Schema conversion code is also done with arro3, and the casting is done in Rust with a RecordBatchReader wrapper.

Related Issue(s)

Documentation

I will add this in a followup in the migration guide :)

@github-actions github-actions bot added the binding/python Issues for the Python package label May 24, 2025
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@ion-elgreco ion-elgreco marked this pull request as draft May 24, 2025 00:35
@ion-elgreco ion-elgreco force-pushed the refactor/migrate-to-pyo3-arrow branch from 00cd58d to 2175e3c Compare May 24, 2025 00:48
Copy link

codecov bot commented May 24, 2025

Codecov Report

Attention: Patch coverage is 0% with 191 lines in your changes missing coverage. Please review.

Project coverage is 71.64%. Comparing base (2fe2ec5) to head (e1865b9).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
python/src/schema.rs 0.00% 92 Missing ⚠️
python/src/lib.rs 0.00% 42 Missing ⚠️
python/src/writer.rs 0.00% 39 Missing ⚠️
python/src/merge.rs 0.00% 12 Missing ⚠️
python/src/query.rs 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3459      +/-   ##
==========================================
- Coverage   71.72%   71.64%   -0.08%     
==========================================
  Files         152      152              
  Lines       46577    46619      +42     
  Branches    46577    46619      +42     
==========================================
- Hits        33407    33402       -5     
- Misses      11024    11067      +43     
- Partials     2146     2150       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ion-elgreco ion-elgreco changed the title Refactor/migrate to pyo3 arrow refactor: remove pyarrow dependency May 24, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 24, 2025
@ion-elgreco ion-elgreco force-pushed the refactor/migrate-to-pyo3-arrow branch 2 times, most recently from 3277a9d to 76590d3 Compare May 24, 2025 14:21
@ion-elgreco ion-elgreco marked this pull request as ready for review May 24, 2025 14:22
@ion-elgreco ion-elgreco requested a review from rtyler as a code owner May 24, 2025 14:22
@ion-elgreco ion-elgreco force-pushed the refactor/migrate-to-pyo3-arrow branch from 76590d3 to 6f52fa6 Compare May 24, 2025 14:27
Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!! This must have been sofe "fun" work rewriting all this stuff :). But well worth it!

Just left a few comments to look at.

Comment on lines +291 to +298
fn __arrow_c_schema__<'py>(&self, py: Python<'py>) -> PyArrowResult<Bound<'py, PyCapsule>> {
let inner_type = DataType::Array(Box::new(self.inner_type.clone()));
let arrow_type: ArrowDataType = (&inner_type)
.try_into()
.map_err(|err: ArrowError| PyException::new_err(err.to_string()))?;

to_schema_pycapsule(py, arrow_type)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHould we maybe add a todo / issue to try and make this "only-once" copy with a OnceCell o.a. I think the capsule interface at least strongly encourages zero copy semantics. While thats harto to achieve, we can maybe get closer?

roeap
roeap previously approved these changes May 24, 2025
Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@ion-elgreco ion-elgreco enabled auto-merge May 24, 2025 19:19
@ion-elgreco ion-elgreco force-pushed the refactor/migrate-to-pyo3-arrow branch from ba6d930 to ed72d67 Compare May 24, 2025 19:24
roeap
roeap previously approved these changes May 24, 2025
auto-merge was automatically disabled May 24, 2025 19:27

Pull request was closed

@ion-elgreco ion-elgreco reopened this May 24, 2025
name: Python Build (Python 3.10 PyArrow latest)
name: Python Build (Python 3.10 Optional latest pyarrow)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ion-elgreco - maybe we need to revert this?

@ion-elgreco ion-elgreco force-pushed the refactor/migrate-to-pyo3-arrow branch from ed72d67 to 9ea1a4e Compare May 24, 2025 19:31
Signed-off-by: Ion Koutsouris <[email protected]>
@ion-elgreco ion-elgreco force-pushed the refactor/migrate-to-pyo3-arrow branch from 9ea1a4e to e1865b9 Compare May 24, 2025 19:31
@ion-elgreco ion-elgreco enabled auto-merge May 24, 2025 19:32
@ion-elgreco ion-elgreco added this pull request to the merge queue May 24, 2025
Merged via the queue into delta-io:main with commit 888b03f May 24, 2025
23 of 25 checks passed
@ion-elgreco ion-elgreco deleted the refactor/migrate-to-pyo3-arrow branch May 24, 2025 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove pyarrow dependency (make opt-in), replace with arro3 for core components
2 participants