Open
Description
Apache Iceberg version
Most recent PyIceberg
Please describe the bug 🐞
See here and the description below for a failing test.
table = catalog.load_table(f"default.{identifier}")
scan = table.scan()
# assert len(scan.to_arrow()) > 0
scan = scan.filter("ts >= '2023-03-05T00:00:00+00:00'")
assert len(scan.to_arrow()) > 0
This code works fine, but uncommenting the first assertion causes the filter
call to throw. The stack trace is immediately helpful:
pyiceberg/table/__init__.py:1710: in filter
return self.update(row_filter=And(self.row_filter, _parse_row_filter(expr)))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <pyiceberg.table.DataScan object at 0x11c065cd0>
overrides = {'row_filter': GreaterThanOrEqual(term=Reference(name='ts'), literal=literal('2023-03-05T00:00:00+00:00'))}
def update(self: S, **overrides: Any) -> S:
"""Create a copy of this table scan with updated fields."""
> return type(self)(**{**self.__dict__, **overrides})
E TypeError: TableScan.__init__() got an unexpected keyword argument 'partition_filters'
pyiceberg/table/__init__.py:1694: TypeError
DataScan
has a cached_property
partition_filters
(see here) that will turn up in self.__dict__
below in the update
method:
iceberg-python/pyiceberg/table/__init__.py
Lines 1692 to 1694 in 045dd10
This will happen if the cache property has been accessed once - i.e. if the scan has already had plan_files
called on it (essentially, if it's been read).
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
No labels