Description
What happens?
Running a simple attach-to-postgres (16) and copy-to-parquet against a partition with 11 billion rows (accessing the partition directly, not the parent table of 46 billion rows).
ATTACH '' AS pg (TYPE POSTGRES, SECRET my_secret, SCHEMA 'my_schema');
COPY pg.my_schema.my_table_t6 TO 'my_table_t6.parquet' (FORMAT PARQUET, OVERWRITE);
Leaving the configuration options untouched, it runs with 64 (max default) parallel connections, but aborts with ERROR: invalid input syntax for type tid: "(4295022000,0)"
.
This in turn is followed by a ROLLBACK
which last for a good hour or so, and seems odd for a read-only operation.
Researching this error seems to show it was something in much older versions of Postgres (v9) but not very common in recent years.
Invoking SET pg_use_ctid_scan = false;
allows the copy to complete but limits the query to one thread and takes about 12 hours even when running/writing on the same server.
The error occurs independent of source table being modified or not during the execution.
I've run several times on different tables with row counts in the 10b range.
However, the statements execute without error for tables in the 1 billion range.
To Reproduce
ATTACH '' AS pg (TYPE POSTGRES, SECRET my_secret, SCHEMA 'my_schema');
COPY pg.my_schema.my_table_t6 TO 'my_table_t6.parquet' (FORMAT PARQUET, OVERWRITE);
OS:
DuckDb on RHEL 8 Host, Postgres on Debian Container same host
PostgreSQL Version:
16
DuckDB Version:
1.2.2
DuckDB Client:
CLI
Full Name:
Michael DiSibio
Affiliation:
MDT
Have you tried this on the latest main
branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree