-
Notifications
You must be signed in to change notification settings - Fork 326
Add support for Bodo DataFrame #2167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@ehsantn looks like theres an issue with the dependency resolution
|
@kevinjqliu Thanks for the quick review. Bodo requires Python >=3.10 since Python 3.9 has been removed by some dependency packages quite a while ago. Do all optional dependencies of PyIceberg need to support Python 3.9? What do you recommend? https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table (Python 3.9 is removed since Apr 05, 2024). |
optional = true | ||
python-versions = ">=3.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like numpy for 3.9 is removed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy is here: https://github.com/ehsantn/iceberg-python/blob/f36265b8cdc9fa3056ad28784467579514cfc850/poetry.lock#L3424
I'm working on packaging Bodo for Python 3.9 to avoid these Poetry issues: bodo-ai/Bodo#637
Our team will just miss structured pattern matching and better type hints of Python 3.10 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg 3.9 is also EOL in a few months (2025-10)
https://devguide.python.org/versions/#supported-versions
Ok, updated Bodo to support Python 3.9 so this should work now. Tried |
@ehsantn i merged a few library upgrades. could you rebase this PR? |
Done. I assume the CI failure is not related to this PR? The test doesn't seem relevant. |
maybe try rebase |
Done. No idea why the CI fails here in unrelated tests. Maybe some dependency got upgraded in the lock file? |
Looks like the Bodo test is actually failing (logs are not very visible). Seems to be just a configuration issue. Working on a fix. |
All tests are passing locally for me now. Hopefully the CI works too. |
under_20_arrow = version.parse(pyarrow.__version__) < version.parse("20.0.0") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should find another way to make these tests pass instead of branching on pyarrow version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas? Maybe use a range of "safe" values instead of a single file size value? I'd be happy to open another PR if there is more work for this.
Bodo is currently pinned to Arrow 19 since the current release version of PyIceberg supports up to Arrow 19. Bodo uses Arrow C++, which currently requires pinning to a single version for pip wheels to work (conda-forge builds against 4 latest Arrow versions in this case but pip doesn't support this yet). It'd be great if PyIceberg wouldn't set an upper version for Arrow if possible.
@kevinjqliu please advise on next steps. This PR looks ready to merge to me and the flakiness of existing unit tests should be addressed separately (would be happy to contribute if priority). |
Rationale for this change
Adds support for Bodo DataFrame library, which is a drop in replacement for Pandas that accelerates and scales Python code automatically by applying query, compiler and HPC optimizations.
Are these changes tested?
Added integration test.
Are there any user-facing changes?
Adds
Table.to_bodo()
function. Example code: