Skip to content

feat: add DefaultIndexKind.NULL to use as index_col in read_gbq*, creating an indexless DataFrame/Series #662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
aaa545b
feat: Support indexless dataframe/series
TrevorBergeron May 6, 2024
7f11946
Merge remote-tracking branch 'github/main' into null_index
TrevorBergeron May 7, 2024
9a5b212
fixes for kurt, skew, median
TrevorBergeron May 8, 2024
0248150
fix unit tests
TrevorBergeron May 8, 2024
26e2d4f
Merge remote-tracking branch 'github/main' into null_index
TrevorBergeron May 8, 2024
16e292b
fix more issues
TrevorBergeron May 8, 2024
5611a86
fix defaulting to primary key logic
TrevorBergeron May 8, 2024
8caa068
Merge remote-tracking branch 'github/main' into null_index
TrevorBergeron May 9, 2024
ea9b120
fix tests
TrevorBergeron May 9, 2024
88fc037
Merge remote-tracking branch 'github/main' into null_index
TrevorBergeron May 15, 2024
27d6f47
many small changes
TrevorBergeron May 15, 2024
75b1fd1
fix accidental null indexes and raising warning
TrevorBergeron May 16, 2024
0b26bbb
Merge remote-tracking branch 'github/main' into null_index
TrevorBergeron May 16, 2024
7142078
fix df quantile index
TrevorBergeron May 16, 2024
7b5f4f6
Merge remote-tracking branch 'github/main' into null_index
TrevorBergeron May 17, 2024
bc28bd4
disable legacy pandas for some tests, add concat test
TrevorBergeron May 17, 2024
bd0aa12
fix series repr
TrevorBergeron May 17, 2024
5efcc27
Update bigframes/session/__init__.py
TrevorBergeron May 17, 2024
4b487e7
Update bigframes/core/rewrite.py
TrevorBergeron May 17, 2024
3892241
Update bigframes/core/rewrite.py
TrevorBergeron May 17, 2024
09af424
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] May 17, 2024
1164faf
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] May 17, 2024
8844f27
Merge branch 'null_index' of https://github.com/googleapis/python-big…
gcf-owl-bot[bot] May 17, 2024
600d500
pr comments addressed
TrevorBergeron May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix more issues
  • Loading branch information
TrevorBergeron committed May 8, 2024
commit 16e292ba7c07742818a310367b3184fb67cb31a6
10 changes: 7 additions & 3 deletions bigframes/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -978,9 +978,13 @@ def quantile(self, q: Union[float, Sequence[float]] = 0.5) -> Union[Series, floa
qs = tuple(q) if utils.is_list_like(q) else (q,)
result = block_ops.quantile(self._block, (self._value_column,), qs=qs)
if utils.is_list_like(q):
result = result.stack()
result = result.drop_levels([result.index_columns[0]])
return Series(result)
# Drop the first level, since only one column
result = result.with_column_labels(result.column_labels.droplevel(0))
result, index_col = result.create_constant(self.name, None)
result = result.set_index([index_col])
return Series(
result.transpose(original_row_index=pandas.Index([self.name]))
)
else:
return cast(float, Series(result).to_pandas().squeeze())

Expand Down
5 changes: 4 additions & 1 deletion bigframes/session/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -812,9 +812,12 @@ def _read_gbq_table(
)

# ----------------------------------------------------
# Create Block & default index if len(index_cols) == 0
# Create Default Index if DefaultIndexKind provided, or no index provided
# ----------------------------------------------------

if not index_col:
index_col = bigframes.enums.DefaultIndexKind.SEQUENTIAL_INT64

index_names: Sequence[Hashable] = index_cols
if index_col == bigframes.enums.DefaultIndexKind.SEQUENTIAL_INT64:
sequential_index_col = bigframes.core.guid.generate_guid("index_")
Expand Down
2 changes: 1 addition & 1 deletion tests/system/small/test_empty_index.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your overall thoughts on testing? How can we be confident that empty/null index works? As we add operations that should support null/empty index, we add tests here and in the usual location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think it will be tricky, as null index tests will require manually configured expectations for many tests, as we cannot always compare against pandas (which doesn't have null index). So yeah, I think we will be stuck with a parallel test suite, which will be a bit burdensome to maintain. Being pandas-equivalent has been a huge boon for testing thus far.

Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def test_empty_index_series_repr(
pd_result = (
scalars_pandas_df_default_index["int64_too"]
.head(5)
.to_string(dtype=True, index=False, name=True, length=True)
.to_string(dtype=True, index=False, name=True)
)
assert bf_result == pd_result

Expand Down
4 changes: 2 additions & 2 deletions tests/system/small/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1429,13 +1429,13 @@ def test_numeric_literal(scalars_dfs):
assert bf_result.dtype == pd.ArrowDtype(pa.decimal128(38, 9))


def test_repr(scalars_dfs):
def test_series_small_repr(scalars_dfs):
scalars_df, scalars_pandas_df = scalars_dfs

col_name = "int64_col"
bf_series = scalars_df[col_name]
pd_series = scalars_pandas_df[col_name]
assert repr(bf_series) == repr(pd_series)
assert repr(bf_series) == pd_series.to_string(length=True, dtype=True, name=True)


def test_sum(scalars_dfs):
Expand Down