Skip to content

Leverage Iceberg-Rust for all the transforms #1833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 23, 2025

Rationale for this change

Testing out to use Iceberg Rust for all of the transforms. I think we have some rounding error in apache/iceberg-rust#1128

Are these changes tested?

Are there any user-facing changes?

(TimestampType(), datetime.datetime(19, 5, 1, 22, 1, 1)),
(TimestamptzType(), datetime.datetime(19, 5, 1, 22, 1, 1, tzinfo=datetime.timezone.utc)),
(TimestampType(), datetime.datetime(2022, 5, 1, 22, 1, 1)),
(TimestamptzType(), datetime.datetime(2022, 5, 1, 22, 1, 1, tzinfo=datetime.timezone.utc)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to merge #1592 into this PR?

@@ -186,8 +186,8 @@ def test_partition_type(table_schema_simple: Schema) -> None:
(DecimalType(5, 9), Decimal(19.25)),
(DateType(), datetime.date(1925, 5, 22)),
(TimeType(), datetime.time(19, 25, 00)),
(TimestampType(), datetime.datetime(19, 5, 1, 22, 1, 1)),
(TimestamptzType(), datetime.datetime(19, 5, 1, 22, 1, 1, tzinfo=datetime.timezone.utc)),
(TimestampType(), datetime.datetime(2022, 5, 1, 22, 1, 1)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is wrong with 19?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I thought it was some kind of clock correction somewhere in time that was accounted by one, and not the other. Digging into it deeper, it looks like there is an issue with negative numbers in general:

>                   assert t.transform(source_type)(value) == t.pyarrow_transform(source_type)(pa.array([value])).to_pylist()[0]
E                   assert -2 == -1
E                    +  where -2 = <function HourTransform.transform.<locals>.<lambda> at 0x143fa6e60>(datetime.datetime(1969, 12, 31, 22, 1, 1))
E                    +    where <function HourTransform.transform.<locals>.<lambda> at 0x143fa6e60> = <bound method HourTransform.transform of HourTransform()>(TimestampType())
E                    +      where <bound method HourTransform.transform of HourTransform()> = HourTransform().transform

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a fix here: apache/iceberg-rust#1146

try:
import pyarrow as pa
except ModuleNotFoundError as e:
raise ModuleNotFoundError("For bucket/truncate transforms, PyArrow needs to be installed") from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change this error message since we use this for all pyarrow transforms

@Fokko Fokko changed the title Test with Iceberg-Rust Leverage Iceberg-Rust for all the transforms Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants