-
Notifications
You must be signed in to change notification settings - Fork 48
feat: add DefaultIndexKind.NULL
to use as index_col
in read_gbq*
, creating an indexless DataFrame/Series
#662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bigframes/core/blocks.py
Outdated
@@ -2113,6 +2119,10 @@ def __repr__(self) -> str: | |||
|
|||
def to_pandas(self) -> pd.Index: | |||
"""Executes deferred operations and downloads the results.""" | |||
if len(self.column_ids) == 0: | |||
raise bigframes.exceptions.NullIndexError( | |||
"Cannot perform this operation without an index. Set an index using set_index." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more specfic about the operation. As a user I would appreciate knowing why can't do this. Seems a bit obvious that we can't get a pandas index because there isn't an index, but let's spell it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrote message
bigframes/session/__init__.py
Outdated
# ---------------------------------------------------- | ||
|
||
if not index_col and len(index_cols) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads odd to me: len(index_cols) == 0
implies not index_col
. Could you rephrase what you're trying to check here? I assume you're trying to exclude the DefaultIndexKind enum from this check. Let's be explicit about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, basically, trying to fall back to sequential index if don't have null index, user provided index columns, or metadata-derived index columns. Rewrote the condition though with the code structure, its still a bit weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your overall thoughts on testing? How can we be confident that empty/null index works? As we add operations that should support null/empty index, we add tests here and in the usual location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think it will be tricky, as null index tests will require manually configured expectations for many tests, as we cannot always compare against pandas (which doesn't have null index). So yeah, I think we will be stuck with a parallel test suite, which will be a bit burdensome to maintain. Being pandas-equivalent has been a huge boon for testing thus far.
bigframes/core/blocks.py
Outdated
and (sort is False) | ||
and (block_identity_join is False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are only bool per type checking, we don't have to worry about any false-y values to mess up "not"
and (sort is False) | |
and (block_identity_join is False) | |
and not sort | |
and not block_identity_join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Doesn't block_identity_join
imply indexless join is allowed? What's this checking for?
Edit: found it
allow_row_identity_join=(not block_identity_join), |
Apparently block_identity_join
is the opposite of what I thought it is. Please add a docstring to this function explaining what these args mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added docstring
Co-authored-by: Tim Sweña (Swast) <[email protected]>
Co-authored-by: Tim Sweña (Swast) <[email protected]>
Co-authored-by: Tim Sweña (Swast) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Test failure |
DefaultIndexKind.NULL
to use as index_col
in read_gbq*
, creating an indexless DataFrame/Series
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕