Skip to content

Fix #328 - Dont do limit pushdown during parallel execution #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 12, 2025

Conversation

noahisaksen
Copy link
Contributor

@noahisaksen noahisaksen commented May 29, 2025

#328

LIMIT pushdown was added to this extension. #313

It pushes LIMIT down to Postgres and removes it from the DuckDB query plan.
Problem is that parallelism can occur in the extension, and one Query gets turned into n postgres queries. Pushing LIMIT down then falsifies the result.

A test is added, using set pg_pages_per_task=1, that reproduces this by forcing parallelism.

I added a check on max_threads, which makes it so that LIMIT is NOT pushed down, if it is anything other than 1.
This should prevent LIMIT from being pushed down in any case where we would have parallel queries from the best of my knowledge. (There could be a better way to check for parallelism that I am not aware of)

Previously

set pg_pages_per_task=1;
FROM s.large_tbl LIMIT 1;

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(0,0)'::tid AND '(1,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(1,0)'::tid AND '(2,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(2,0)'::tid AND '(3,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(3,0)'::tid AND '(4,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(4,0)'::tid AND '(5,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

┌────────────┐
│     i      │
│   int64    │
├────────────┤
│      70286 │
│      72772 │
│      74128 │
│      75032 │
│      76388 │
│      79100

AFTER (NO LIMIT DURING PARALLEL)

set pg_pages_per_task=1;
FROM s.large_tbl LIMIT 1;

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(0,0)'::tid AND '(1,0)'::tid) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(1,0)'::tid AND '(2,0)'::tid) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(2,0)'::tid AND '(3,0)'::tid) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(3,0)'::tid AND '(4,0)'::tid) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(4,0)'::tid AND '(5,0)'::tid) TO STDOUT (FORMAT "binary");

┌───────┐
│   i   │
│ int64 │
├───────┤
│   0   │
└───────┘

@Mytherin
Copy link
Contributor

Mytherin commented Jun 10, 2025

Thanks for the PR! The problem makes sense to me. Perhaps instead of not applying the optimization - we can instead set the threads to 1 when the optimization is applied? i.e. where we have bind_data.limit = ... we also set bind_data.max_threads = 1? That should also fix the issue while still allowing the optimization to be applied.

@Mytherin Mytherin merged commit c0411b9 into duckdb:main Jun 12, 2025
16 checks passed
@Mytherin
Copy link
Contributor

I'll merge this as-is for now, we can pick up enabling the optimization in more scenarios later on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants