refactor: Connector refactor #12

yingsu00 · 2025-05-21T08:15:24Z

No description provided.

fix decimal avg function precision issue

…file column type (facebookincubator#12350)" This reverts commit 5ad65e4.

The function toValues removes duplicated values from the vector and return them in a std::vector. It was used to build an InPredicate. It will be needed for building NOT IN filters for Iceberg equality delete read as well, therefore moving it from velox/functions/prestosql/InPred icate.cpp to velox/type/Filter.h. This commit also renames it to deDuplicateValues to make it easier to understand.

This commit introduces EqualityDeleteFileReader, which is used to read Iceberg splits with equality delete files. The equality delete files are read to construct domain filters or filter functions, which then would be evaluated in the base file readers. When there is only one equality delete field, and when that field is an Iceberg identifier field, i.e. non-floating point primitive types, the values would be converted to a list as a NOT IN domain filter, with the NULL treated separately. This domain filter would then be pushed to the ColumnReaders to filter our unwanted rows before they are read into Velox vectors. When the equality delete column is a nested column, e.g. a sub-column in a struct, or the key in a map, such column may not be in the base file ScanSpec. We need to add/remove these subfields to/from the SchemaWithId and ScanSpec recursively if they were not in the ScanSpec already. A test is also added for such case. If there are more than one equality delete field, or the field is not an Iceberg identifier field, the values would be converted to a typed expression in the conjunct of disconjunts form. This expression would be evaluated as the remaining filter function after the rows are read into the Velox vectors. Note that this only works for Presto now as the "neq" function is not registered by Spark. See https://github.com/ facebookincubator/issues/12667 Note that this commit only supports integral types. VARCHAR and VARBINARY need to be supported in future commits (see facebookincubator#12664). Co-authored-by: Naveen Kumar Mahadevuni <[email protected]>

rui-mo and others added 10 commits May 14, 2025 18:17

[5962] Support struct schema evolution matching by name

8b8ce6b

[6020 ] Spark sql avg agg function support decimal

15a4ca9

fix decimal avg function precision issue

[oap ] Register merge extract companion agg functions without suffix

86c480c

[11067] Support scan filter for decimal in ORC

4331529

[11771] [11772] Fix smj result mismatch issue

eee1cf6

Revert "fix(parquet): Avoid SEGV if table column type does not match …

d041885

…file column type (facebookincubator#12350)" This reverts commit 5ad65e4.

Rename makeColumnHandle and makeTableHandle

bde260e

Refactor Velox connectors

2049f50

yingsu00 force-pushed the connector_refactor branch from 0fe2d6a to 2049f50 Compare May 21, 2025 09:02

zhouyuan force-pushed the main branch 2 times, most recently from 7a6cc63 to 9bd5c27 Compare June 18, 2025 14:37

zhouyuan force-pushed the main branch from 9bd5c27 to 2badf28 Compare June 20, 2025 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: Connector refactor #12

refactor: Connector refactor #12

yingsu00 commented May 21, 2025

Uh oh!

Uh oh!

refactor: Connector refactor #12

Are you sure you want to change the base?

refactor: Connector refactor #12

Conversation

yingsu00 commented May 21, 2025

Uh oh!

Uh oh!