feat: add `DefaultIndexKind.NULL` to use as `index_col` in `read_gbq*`, creating an indexless DataFrame/Series #662

TrevorBergeron · 2024-05-06T23:23:09Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

bigframes/core/__init__.py

bigframes/core/blocks.py

tswast · 2024-05-15T15:31:30Z

bigframes/core/blocks.py

@@ -2113,6 +2119,10 @@ def __repr__(self) -> str:

    def to_pandas(self) -> pd.Index:
        """Executes deferred operations and downloads the results."""
+        if len(self.column_ids) == 0:
+            raise bigframes.exceptions.NullIndexError(
+                "Cannot perform this operation without an index. Set an index using set_index."


Let's be more specfic about the operation. As a user I would appreciate knowing why can't do this. Seems a bit obvious that we can't get a pandas index because there isn't an index, but let's spell it out.

rewrote message

bigframes/core/rewrite.py

tswast · 2024-05-15T15:44:45Z

bigframes/session/__init__.py

        # ----------------------------------------------------

+        if not index_col and len(index_cols) == 0:


This reads odd to me: len(index_cols) == 0 implies not index_col. Could you rephrase what you're trying to check here? I assume you're trying to exclude the DefaultIndexKind enum from this check. Let's be explicit about that.

yeah, basically, trying to fall back to sequential index if don't have null index, user provided index columns, or metadata-derived index columns. Rewrote the condition though with the code structure, its still a bit weird.

tests/system/small/test_empty_index.py

tswast · 2024-05-15T15:48:20Z

tests/system/small/test_empty_index.py

What are your overall thoughts on testing? How can we be confident that empty/null index works? As we add operations that should support null/empty index, we add tests here and in the usual location?

Hmm, I think it will be tricky, as null index tests will require manually configured expectations for many tests, as we cannot always compare against pandas (which doesn't have null index). So yeah, I think we will be stuck with a parallel test suite, which will be a bit burdensome to maintain. Being pandas-equivalent has been a huge boon for testing thus far.

bigframes/core/blocks.py

tswast · 2024-05-17T17:36:01Z

bigframes/core/blocks.py

+            and (sort is False)
+            and (block_identity_join is False)


Since these are only bool per type checking, we don't have to worry about any false-y values to mess up "not"

Suggested change

and (sort is False)

and (block_identity_join is False)

and not sort

and not block_identity_join

Question: Doesn't block_identity_join imply indexless join is allowed? What's this checking for?

Edit: found it

python-bigquery-dataframes/bigframes/core/blocks.py

Line 2292 in ca284cc

allow_row_identity_join=(not block_identity_join),

Apparently block_identity_join is the opposite of what I thought it is. Please add a docstring to this function explaining what these args mean.

added docstring

bigframes/core/rewrite.py

bigframes/session/__init__.py

Co-authored-by: Tim Sweña (Swast) <[email protected]>

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

…query-dataframes into null_index

tswast

Thanks!

tswast · 2024-05-20T20:49:46Z

Test failure FAILED tests/system/large/test_remote_function.py::test_remote_function_via_session_vpc_invalid appears to be unrelated.

feat: Support indexless dataframe/series

aaa545b

product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels May 6, 2024

TrevorBergeron added 8 commits May 7, 2024 19:44

Merge remote-tracking branch 'github/main' into null_index

7f11946

fixes for kurt, skew, median

9a5b212

fix unit tests

0248150

Merge remote-tracking branch 'github/main' into null_index

26e2d4f

fix more issues

16e292b

fix defaulting to primary key logic

5611a86

Merge remote-tracking branch 'github/main' into null_index

8caa068

fix tests

ea9b120

TrevorBergeron marked this pull request as ready for review May 10, 2024 16:22

TrevorBergeron requested review from a team as code owners May 10, 2024 16:22

TrevorBergeron requested a review from tswast May 10, 2024 16:22

blunderbuss-gcf bot assigned SalemJorden May 10, 2024

tswast requested changes May 15, 2024

View reviewed changes

TrevorBergeron added 5 commits May 15, 2024 19:35

Merge remote-tracking branch 'github/main' into null_index

88fc037

many small changes

27d6f47

fix accidental null indexes and raising warning

75b1fd1

Merge remote-tracking branch 'github/main' into null_index

0b26bbb

fix df quantile index

7142078

TrevorBergeron requested a review from tswast May 16, 2024 17:29

TrevorBergeron added 3 commits May 17, 2024 02:01

Merge remote-tracking branch 'github/main' into null_index

7b5f4f6

disable legacy pandas for some tests, add concat test

bc28bd4

fix series repr

bd0aa12

tswast reviewed May 17, 2024

View reviewed changes

TrevorBergeron and others added 3 commits May 17, 2024 11:15

Update bigframes/session/__init__.py

5efcc27

Co-authored-by: Tim Sweña (Swast) <[email protected]>

Update bigframes/core/rewrite.py

4b487e7

Co-authored-by: Tim Sweña (Swast) <[email protected]>

Update bigframes/core/rewrite.py

3892241

Co-authored-by: Tim Sweña (Swast) <[email protected]>

gcf-owl-bot bot and others added 4 commits May 17, 2024 18:17

🦉 Updates from OwlBot post-processor

09af424

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

🦉 Updates from OwlBot post-processor

1164faf

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Merge branch 'null_index' of https://github.com/googleapis/python-big…

8844f27

…query-dataframes into null_index

pr comments addressed

600d500

tswast approved these changes May 20, 2024

View reviewed changes

tswast changed the title ~~feat: Support indexless dataframe/series~~ feat: add DefaultIndexKind.NULL to use as index_col in read_gbq*, creating an indexless DataFrame/Series May 20, 2024

tswast merged commit 29e4886 into main May 20, 2024

tswast deleted the null_index branch May 20, 2024 20:51

release-please bot mentioned this pull request May 20, 2024

chore(main): release 1.7.0 #685

Merged

		# ----------------------------------------------------

		if not index_col and len(index_cols) == 0:

feat: add DefaultIndexKind.NULL to use as index_col in read_gbq*, creating an indexless DataFrame/Series #662

feat: add DefaultIndexKind.NULL to use as index_col in read_gbq*, creating an indexless DataFrame/Series #662

Uh oh!

Conversation

TrevorBergeron commented May 6, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

tswast commented May 20, 2024

Uh oh!

Uh oh!

feat: add `DefaultIndexKind.NULL` to use as `index_col` in `read_gbq*`, creating an indexless DataFrame/Series #662

feat: add `DefaultIndexKind.NULL` to use as `index_col` in `read_gbq*`, creating an indexless DataFrame/Series #662