Skip to content

feat: add pd.get_dummies #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Nov 1, 2023
Merged

feat: add pd.get_dummies #149

merged 28 commits into from
Nov 1, 2023

Conversation

milkshakeiii
Copy link
Contributor

@milkshakeiii milkshakeiii commented Oct 26, 2023

  • Emulate most aspects of the pandas get_dummies interface
  • Tests and doctest examples
  • Performance bottleneck is BigQuery column count in most cases.

@milkshakeiii milkshakeiii requested review from a team as code owners October 26, 2023 06:39
@milkshakeiii milkshakeiii requested a review from shobsi October 26, 2023 06:39
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Oct 26, 2023
@milkshakeiii milkshakeiii requested a review from shobsi October 27, 2023 19:19
@milkshakeiii
Copy link
Contributor Author

Thanks for the review! Working on addressing these comments today.

Copy link
Contributor

@TrevorBergeron TrevorBergeron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good, just a few suggestions on cutting down the method length a bit

@milkshakeiii milkshakeiii merged commit d8baad5 into main Nov 1, 2023
@milkshakeiii milkshakeiii deleted the b297352026-get-dummies branch November 1, 2023 20:41
ashleyxuu pushed a commit that referenced this pull request Nov 1, 2023
* feat: add pd.get_dummies

* remove unneeded prefix case

* param/documentation fixes

* be stricter about types in test

* be stricter about types in series test

* remove unneeded comment

* adjust for type difference in pandas 1

* add example code (tested)

* fix None columns and add test cases

* variable names and _get_unique_values per-column

* account for pandas 1 behavior difference

* remove already_seen set

* avoid unnecessary join/projection

* fix column ordering edge case

* adjust for picky examples checker

* example tweak

* make part of the example comments

* use ellipsis in doctest comment

* add <BLANKLINES> to doctest string

* extract parameter standardization

* extract submethods

---------

Co-authored-by: Henry J Solberg <[email protected]>
ashleyxuu pushed a commit that referenced this pull request Nov 1, 2023
* feat: add pd.get_dummies

* remove unneeded prefix case

* param/documentation fixes

* be stricter about types in test

* be stricter about types in series test

* remove unneeded comment

* adjust for type difference in pandas 1

* add example code (tested)

* fix None columns and add test cases

* variable names and _get_unique_values per-column

* account for pandas 1 behavior difference

* remove already_seen set

* avoid unnecessary join/projection

* fix column ordering edge case

* adjust for picky examples checker

* example tweak

* make part of the example comments

* use ellipsis in doctest comment

* add <BLANKLINES> to doctest string

* extract parameter standardization

* extract submethods

---------

Co-authored-by: Henry J Solberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants