-
Notifications
You must be signed in to change notification settings - Fork 378
Write small decimals as INTs #1983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI failing due to the release of snowballstemmer: https://pypi.org/project/snowballstemmer/#history |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fix. Added a few notes to the review for my future self and other reviewers :)
with fo.create(overwrite=True) as fos: | ||
with pq.ParquetWriter(fos, schema=arrow_table.schema, **parquet_writer_kwargs) as writer: | ||
with pq.ParquetWriter( | ||
fos, schema=arrow_table.schema, store_decimal_as_integer=True, **parquet_writer_kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""
By default, this is DISABLED and all decimal types annotate fixed_len_byte_array. When enabled, the writer will use the following physical types to store decimals:
- int32: for 1 <= precision <= 9.
- int64: for 10 <= precision <= 18.
- fixed_len_byte_array: for precision > 18.
"""
from https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this matches the parquet data type mapping for decimal
https://iceberg.apache.org/spec/#parquet
Looks like arrow doesnt have decimal32 support yet |
@kevinjqliu Good call, looking at the code, it seems like it will automatically map it to INT32/INT64: https://github.com/apache/arrow/blob/598938711a8376cbfdceaf5c77ab0fd5057e6c02/cpp/src/parquet/arrow/schema.cc#L380-L392 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding the comments.
I double checked that decimal128 also errors for precision >38
>>> pyarrow.decimal128(38+1, 0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/types.pxi", line 4718, in pyarrow.lib.decimal128
File "pyarrow/types.pxi", line 4768, in pyarrow.lib.decimal128
ValueError: precision should be between 1 and 38
# Rationale for this change Resolves apache#1979 # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
Rationale for this change
Resolves #1979
Are these changes tested?
Are there any user-facing changes?