Parquet reader forces Decimal32/64 columns to Decimal128

**Describe the bug**

The parquet reader always widens `INT32 (Decimal)` and `INT64 (Decimal)` columns to `Decimal128`. It might be possible for a reader to specifically request the narrower types in their read schema, but if they just read what was written to disk they get back a different type than what was written.

**To Reproduce**

Several unit tests _expect_ the current behavior, and started failing when I changed the parquet reader to preserve types:
```
test arrow::array_reader::primitive_array::tests::test_primitive_array_reader_decimal_types ... FAILED
test arrow::arrow_reader::tests::test_decimal ... FAILED
test arrow::arrow_reader::tests::test_arbitrary_decimal ... FAILED
test arrow::arrow_reader::tests::test_read_decimal_file ... FAILED
test arrow::arrow_writer::tests::arrow_writer_decimal128_dictionary ... FAILED
test arrow::arrow_writer::tests::arrow_writer_decimal ... FAILED
test arrow::arrow_writer::tests::arrow_writer_decimal256_dictionary ... FAILED
test arrow::arrow_writer::tests::arrow_writer_decimal64_dictionary ... FAILED
test arrow::schema::tests::test_arrow_schema_roundtrip ... FAILED
test arrow::schema::tests::test_decimal_fields ... FAILED
test arrow::schema::tests::test_column_desc_to_field ... FAILED
test statistics::test_decimal128 ... FAILED
test statistics::test_decimal_256 ... FAILED
test statistics::test_decimal64 ... FAILED
test statistics::test_data_page_stats_with_all_null_page ... FAILED
```

**Expected behavior**

If it was written as `INT32 (Decimal)` it should come back as `Decimal32` unless the user requested something else via the read schema. Ditto for `INT64 (Decimal)` as `Decimal64`.

**Additional context**

See https://github.com/apache/arrow-rs/pull/8540#discussion_r2399703980

The variant shredding decimal integration tests fail by default, because they expect 32- and 64-bit decimals to faithfully round trip through parquet; I had to add a manual casting operation `VariantArray` constructor to compensate.

For the fix I tried (but eventually reverted in favor of the above-mentioned casting), see
https://github.com/apache/arrow-rs/pull/8540/files/cd978f5374503fff3a265bc06ad6ce143a4110f3..5b55f121d7461bd31dfc2d204f8fc303523a3988

The idea was to split the `decimal_type` helper (which only understood 128- and 256-bit decimals) into `decimal_[32|64|128|256]_type` helpers, where each helper's name indicates a lower bound on the resulting decimal's width. Too-high precision can still force a wider decimal, tho it's not clear to me that this is actually correct -- is e.g. `INT32 (Decimal(15, 2))` legal?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parquet reader forces Decimal32/64 columns to Decimal128 #8549

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parquet reader forces Decimal32/64 columns to Decimal128 #8549

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions