Slow read_gbq comparing to QueryJob.to_dataframe #924
Labels
api: bigquery
Issues related to the googleapis/python-bigquery-pandas API.
priority: p3
Desirable enhancement or fix. May not be included in next release.
There is a substantial difference in execution time between
pandas_gbq.read_gbq
andgoogle.cloud.bigquery.job.query.QueryJob.to_dataframe
functions.The reason is
pandas_gbq.read_gbq
callsrows_iter.to_dataframe
here withdtypes=conversion_dtypes
which causes converting columns in RowIterator.to_dataframe.conversion_dtypes
is created this way:conversion_dtypes = _bqschema_to_nullsafe_dtypes(schema_fields)
Why this is default behavior?
In our case we don't need these costly column conversions.
Profiling results
Column transformations take 55% time of
RowIterator.to_dataframe
.Environment details
pandas-gbq
version: 0.28.0The text was updated successfully, but these errors were encountered: