-
Notifications
You must be signed in to change notification settings - Fork 125
CLN: Use to_dataframe
to download query results.
#247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
f26e390
CLN: Use `to_dataframe` to download query results.
tswast 7043501
Supply expected dtypes to to_dataframe()
tswast b9f931d
Bump miniimum google-cloud-bigquery version for dtypes argument.
tswast f805dba
Update tests to match dtypes from to_dataframe().
tswast 275ea26
Cast to correct dtype in empty dataframes.
tswast 90eb9fe
Blacken
tswast 013b00f
Blacken benchmark.
tswast 5a526c0
Remove timezone from datetime tests.
tswast d8e3b99
Blacken tests.
tswast 1e4009a
Update docs for minimum google-cloud-bigquery version.
tswast 827a065
Update version number in unit tests.
tswast 159bda0
Update dependencies for tests.
tswast 0ecce9b
Fix lint error.
tswast b80cf5c
Specify column order on empty DataFrame.
tswast 20dd01a
Don't wipe out conda dependencies.
tswast 59a9328
Add pydata-google-auth to conda deps.
tswast d7c1ca5
Document change in behavior of TIMEZONE columns.
tswast 5b1bd0a
Use timezone-naive datetime64 dtype for TIMESTAMP
tswast 094058b
Blacken
tswast File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# pandas-gbq benchmarks | ||
|
||
This directory contains a few scripts which are useful for performance | ||
testing the pandas-gbq library. Use cProfile to time the script and see | ||
details about where time is spent. To avoid timing how long BigQuery takes to | ||
execute a query, run the benchmark twice to ensure the results are cached. | ||
|
||
## `read_gbq` | ||
|
||
Read a small table (a few KB). | ||
|
||
python -m cProfile --sort=cumtime read_gbq_small_results.py | ||
|
||
Read a large-ish table (100+ MB). | ||
|
||
python -m cProfile --sort=cumtime read_gbq_large_results.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
import pandas_gbq | ||
|
||
# Select 163 MB worth of data, to time how long it takes to download large | ||
# result sets. | ||
df = pandas_gbq.read_gbq( | ||
"SELECT * FROM `bigquery-public-data.usa_names.usa_1910_2013`", | ||
dialect="standard", | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
import pandas_gbq | ||
|
||
# Select a few KB worth of data, to time downloading small result sets. | ||
df = pandas_gbq.read_gbq( | ||
"SELECT * FROM `bigquery-public-data.utility_us.country_code_iso`", | ||
dialect="standard", | ||
) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
pandas==0.19.0 | ||
google-auth==1.4.1 | ||
google-auth-oauthlib==0.0.1 | ||
google-cloud-bigquery==0.32.0 | ||
google-cloud-bigquery==1.9.0 | ||
pydata-google-auth==0.1.2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not generally an issue as this is a single dtype for all columns; you are much better off constructing a properly dtypes Series in order to use the tz-aware dtypes
furthermore you need to be really clear if this is a localized timestamp or not ; see the sql conversion code in pandas which handles all of this correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't having an issue except on the conda build with pandas pre-wheels. From pandas-dev/pandas#12513 (comment) you can see in the stacktrace that this is actually constructing
Series
and then combining them as a DataFrame. It's theSeries
constructor that's failing ondtype='datetime64[ns, UTC]'
.I had failures on the DataFrame constructor too in the unit tests. If it had worked to construct a series, you're right that I'd prefer to have these be tz-aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you must have an old version
this was worked out a while back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll double-check the Conda CI config.
I kind of prefer to leave this as-is (not tz-aware) as that's the existing behavior. https://github.com/pydata/pandas-gbq/blob/f729a44a48744acc2898350fbfbded791d900967/pandas_gbq/gbq.py#L647 I should make a separate issue for changing that for TIMESTAMP columns.