Skip to content

feat: add DataFrame.to_pandas_batches() to download large DataFrame objects #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Oct 26, 2023
Merged
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
c35523c
refactor: make `to_pandas()` call `to_arrow()` and use local dtypes i…
tswast Oct 23, 2023
9f5865c
Merge branch 'main' into b280662868-to_arrow
tswast Oct 24, 2023
c2f9d72
use integer_object_nulls=True to preserve NA/NaN distinction
tswast Oct 24, 2023
7ccb61d
allow NUMERIC/BIGNUMERIC to cast to FLOAT64
tswast Oct 24, 2023
829cf99
better workaround for Float64Dtype NaNs
tswast Oct 25, 2023
c158780
Merge remote-tracking branch 'origin/main' into b280662868-to_arrow
tswast Oct 25, 2023
3a90214
fix type error
tswast Oct 25, 2023
8bdfd79
add unit tests for extreme values
tswast Oct 25, 2023
db81a1c
fix tests on latest pandas
tswast Oct 25, 2023
b25112b
mypy fixes
tswast Oct 25, 2023
a3705f9
fix mod tests
tswast Oct 26, 2023
33a9d9f
Merge remote-tracking branch 'origin/main' into b280662868-to_arrow
tswast Oct 26, 2023
1a7b2d7
feat: add `DataFrame.to_pandas_batches()` to download large `DataFram…
tswast Oct 24, 2023
0c23388
Merge branch 'main' into b280662868-to_pandas_batches
tswast Oct 26, 2023
c4a8b15
allow copies
tswast Oct 26, 2023
239e5ef
allow copies only for contiguous arrays
tswast Oct 26, 2023
ea4d9df
test with chunked_array
tswast Oct 26, 2023
38dba2a
Merge remote-tracking branch 'origin/main' into b280662868-to_pandas_…
tswast Oct 26, 2023
2f225dd
Merge branch 'main' into b280662868-to_pandas_batches
tswast Oct 26, 2023
e1e291d
explain type: ignore
tswast Oct 26, 2023
3166a65
Merge remote-tracking branch 'origin/main' into b280662868-to_pandas_…
tswast Oct 26, 2023
2e218e9
Merge branch 'b280662868-to_pandas_batches' of github.com:googleapis/…
tswast Oct 26, 2023
6dd5aae
Merge branch 'main' into b280662868-to_pandas_batches
gcf-merge-on-green[bot] Oct 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
test with chunked_array
  • Loading branch information
tswast committed Oct 26, 2023
commit ea4d9dff65e6ddecd1e16fd8c99d901ffc16c4c4
56 changes: 56 additions & 0 deletions tests/unit/session/test_io_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,62 @@
),
id="scalar-dtypes",
),
pytest.param(
pyarrow.Table.from_pydict(
{
"bool": pyarrow.chunked_array(
[[True, None], [True, False]],
type=pyarrow.bool_(),
),
"bytes": pyarrow.chunked_array(
[[b"123", None], [b"abc", b"xyz"]],
type=pyarrow.binary(),
),
"float": pyarrow.chunked_array(
[[1.0, None], [float("nan"), -1.0]],
type=pyarrow.float64(),
),
"int": pyarrow.chunked_array(
[[1, None], [-1, 2**63 - 1]],
type=pyarrow.int64(),
),
"string": pyarrow.chunked_array(
[["123", None], ["abc", "xyz"]],
type=pyarrow.string(),
),
}
),
{
"bool": "boolean",
"bytes": "object",
"float": pandas.Float64Dtype(),
"int": pandas.Int64Dtype(),
"string": "string[pyarrow]",
},
pandas.DataFrame(
{
"bool": pandas.Series([True, None, True, False], dtype="boolean"),
"bytes": [b"123", None, b"abc", b"xyz"],
"float": pandas.Series(
pandas.arrays.FloatingArray( # type: ignore
numpy.array(
[1.0, float("nan"), float("nan"), -1.0], dtype="float64"
),
numpy.array([False, True, False, False], dtype="bool"),
),
dtype=pandas.Float64Dtype(),
),
"int": pandas.Series(
[1, None, -1, 2**63 - 1],
dtype=pandas.Int64Dtype(),
),
"string": pandas.Series(
["123", None, "abc", "xyz"], dtype="string[pyarrow]"
),
}
),
id="scalar-dtypes-chunked_array",
),
pytest.param(
pyarrow.Table.from_pydict(
{
Expand Down