Skip to content

Fix: optimize projections for unnest logical plan. #16632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bert-beyondloops
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

See issue

What changes are included in this PR?

Are these changes tested?

This PR serves as a starting point for discussion in the community.

Are there any user-facing changes?

No

@github-actions github-actions bot added the optimizer Optimizer rules label Jun 30, 2025
@alamb
Copy link
Contributor

alamb commented Jul 7, 2025

Thanks @bert-beyondloops -- sorry for the delay in review

I started the CI tests

The changes in this PR make sense to me -- the only thing I think it needs is some test to ensure we don't regress

Can you please add some tests, ideally in sqllogicteset format?

Here are the instructions: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest

Ideally you should be able to extend one of the existing test files in https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files

Perhaps in https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/unnest.slt

Specifically, I think an explain plan test that shows the projection being pushed down would be about all we need

@bert-beyondloops
Copy link
Contributor Author

Hi @alamb,

Thanks for the review.

There are already some tests verifying the explain plans for the unnest plan. The scheduled run will currently fail.
I'll adapt the 2 test failures

@alamb
Copy link
Contributor

alamb commented Jul 7, 2025

There are already some tests verifying the explain plans for the unnest plan. The scheduled run will currently fail.
I'll adapt the 2 test failures

Indeed the exising tests definitely seem to cover it -- you can see the reduction in projections

2. query result mismatch:
[SQL] explain select uc2 from (select unnest(column2) as uc2, column1 from v) where uc2 > 3;
[Diff] (-expected|+actual)
    physical_plan
    01)ProjectionExec: expr=[__unnest_placeholder(v.column2,depth=1)@0 as uc2]
    02)--CoalesceBatchesExec: target_batch_size=8192
    03)----FilterExec: __unnest_placeholder(v.column2,depth=1)@0 > 3
    04)------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
-   05)--------ProjectionExec: expr=[__unnest_placeholder(v.column2,depth=1)@0 as __unnest_placeholder(v.column2,depth=1)]
-   06)----------UnnestExec
-   07)------------ProjectionExec: expr=[column2@1 as __unnest_placeholder(v.column2), column1@0 as column1]
-   08)--------------DataSourceExec: partitions=1, partition_sizes=[1]
+   05)--------UnnestExec
+   06)----------ProjectionExec: expr=[column2@0 as __unnest_placeholder(v.column2)]
+   07)------------DataSourceExec: partitions=1, partition_sizes=[1]
at /home/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/push_down_filter.slt:54

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 7, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @bert-beyondloops -- this looks good to me

05)--------CoalesceBatchesExec: target_batch_size=8192
06)----------FilterExec: column1@0 = 2
06)----------FilterExec: column1@0 = 2, projection=[column2@1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those look like better plans to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

I'm still looking into the panicking issue.

@github-actions github-actions bot added logical-expr Logical plan and expressions proto Related to proto crate labels Jul 8, 2025
@bert-beyondloops
Copy link
Contributor Author

bert-beyondloops commented Jul 8, 2025

I moved the logic around the unnest column detection into a new Unnest::try_new method.
In my opinion, this did not belong into the LogicalPlanBuilder.

The projection pushdown optimization code has been improved/fixed

All tests are green now.

@bert-beyondloops bert-beyondloops requested a review from alamb July 8, 2025 10:48
@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

This PR appears to have a conflict now

Screenshot 2025-07-08 at 9 35 09 AM

@@ -3995,6 +3995,211 @@ impl PartialOrd for Unnest {
}
}

impl Unnest {
pub fn try_new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this looks much nicer

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good -- thank you @bert-beyondloops

options: into_required!(unnest.options)?,
}))

LogicalPlanBuilder::from(input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice improvement too

@bert-beyondloops
Copy link
Contributor Author

bert-beyondloops commented Jul 8, 2025

branch has been rebased.
Should I squash commit or is this done during merge ?

@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

branch has been rebased. Should I squash commit or is this done during merge ?

commits are squashed on merge so no need to do it on the branch

Pushing commits rather than rebasing also helps make it easier to see what has changed since the last review, in my opinion, though I know there are other opinions on this matter

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnest logical plan lacks decent projection push down
2 participants