-
Notifications
You must be signed in to change notification settings - Fork 378
Add snapshot-loading-mode
option to RESTCatalog
#1998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This will allow to set the snapshots to be send back. In case of refs, only the snapshots referenced by branches or tags will be returned: https://github.com/apache/iceberg/blob/5d2230ead79da64a8c871a02eb1304a94aaece5c/open-api/rest-catalog-open-api.yaml#L954-L956
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
if mode := self.properties.get(SNAPSHOT_LOADING_MODE): | ||
if mode in {"all", "refs"}: | ||
params["snapshots"] = mode | ||
else: | ||
raise ValueError("Invalid snapshot-loading-mode: {}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def load_table(self, identifier: Union[str, Identifier]) -> Table: | ||
response = self._session.get(self.url(Endpoints.load_table, prefixed=True, **self._split_identifier_for_path(identifier))) | ||
params = {} | ||
if mode := self.properties.get(SNAPSHOT_LOADING_MODE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to add this as a parameter to load_table that takes preference over session-level properties? I guess it's not part of the base catalog interface but 🤷🏻♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good seeing you here @corleyma, and I share your concern here. In some situations, we want to override this as well:
- To enable time-travel.
- In the case of a write, we want to know about all the snapshots to see if there are any conflicts.
I'm open to adding an option to the base interface, however, it does not make a lot of sense in the case of non-REST catalogs where we have to read the JSON anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for coming back to this late.
Maybe the base load_table interface should just change to accept kwargs, and we can add here some implementation-specific kwargs for REST catalog?
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change This allows users only to fetch the snapshots that are referenced by a tag or a branch. # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
Rationale for this change
This allows users only to fetch the snapshots that are referenced by a tag or a branch.
Are these changes tested?
Are there any user-facing changes?