Skip to content

[SPARK-42746][SQL][FIXUP] Fix optimizer failure for SortOrder in the LISTAGG function #51117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -317,21 +317,43 @@ private[aggregate] object CollectTopK {
case class ListAgg(
child: Expression,
delimiter: Expression = Literal(null),
orderExpressions: Seq[SortOrder] = Nil,
orderChildExpressions: Seq[Expression] = Nil,
orderDirections: Seq[SortDirection] = Nil,
orderNullOrderings: Seq[NullOrdering] = Nil,
orderSameOrderExpressions: Seq[Seq[Expression]] = Nil,
Copy link
Contributor

@cloud-fan cloud-fan Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? There is no SortOrder anymore, and we don't need it to make the planner smart to remove duplicated sorts.

Copy link
Contributor Author

@uros-db uros-db Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SortOrder is used because of SupportsOrderingWithinGroup. Note that withOrderingWithinGroup uses all of the relevant children for ListAgg in s copy call, and then later we rely on orderExpressions - which is no longer a child (important because SortOrder is unevaluable). So to make this approach, I think we need this. We could of course explore alternative approaches if you have any suggestions?

mutableAggBufferOffset: Int = 0,
inputAggBufferOffset: Int = 0)
extends Collect[mutable.ArrayBuffer[Any]]
with SupportsOrderingWithinGroup
with ImplicitCastInputTypes {

val orderExpressions: Seq[SortOrder] = orderChildExpressions.zipWithIndex.map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a lazy val?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be, but not sure if it would bring any special value to this particular expression. No strong opinion, I can update if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be lazy val, to avoid repeatedly triggerring it during plan transformation when plan nodes are copied frequently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll update to lazy.

case (orderChild, i) =>
SortOrder(
child = orderChild,
direction = if (i < orderDirections.length) orderDirections(i) else Ascending,
nullOrdering = if (i < orderNullOrderings.length) orderNullOrderings(i) else NullsLast,
sameOrderExpressions = if (i < orderSameOrderExpressions.length) {
orderSameOrderExpressions(i)
} else {
Seq.empty
}
)
}

override def orderingFilled: Boolean = orderExpressions.nonEmpty

override def isOrderingMandatory: Boolean = false

override def isDistinctSupported: Boolean = true

override def withOrderingWithinGroup(orderingWithinGroup: Seq[SortOrder]): AggregateFunction =
copy(orderExpressions = orderingWithinGroup)
copy(
orderChildExpressions = orderingWithinGroup.map(_.child),
orderDirections = orderingWithinGroup.map(_.direction),
orderNullOrderings = orderingWithinGroup.map(_.nullOrdering),
orderSameOrderExpressions = orderingWithinGroup.map(_.sameOrderExpressions)
)

override protected lazy val bufferElementType: DataType = {
if (!needSaveOrderValue) {
Expand All @@ -347,10 +369,10 @@ case class ListAgg(
lazy val needSaveOrderValue: Boolean = !isOrderCompatible(orderExpressions)

def this(child: Expression) =
this(child, Literal(null), Nil, 0, 0)
this(child, Literal(null), Nil, Nil, Nil, Nil, 0, 0)

def this(child: Expression, delimiter: Expression) =
this(child, delimiter, Nil, 0, 0)
this(child, delimiter, Nil, Nil, Nil, Nil, 0, 0)

override def nullable: Boolean = true

Expand Down Expand Up @@ -534,14 +556,18 @@ case class ListAgg(
false
}

override protected def withNewChildrenInternal(newChildren: IndexedSeq[Expression]): Expression =
override protected def withNewChildrenInternal(
newChildren: IndexedSeq[Expression]): Expression = {
val sortOrderExpressions: Seq[SortOrder] = newChildren.drop(2).map(_.asInstanceOf[SortOrder])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really call ListAgg#withChildren this way? passing SortOrder as the new children?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because children is overriden (please see #51117 (comment)).

copy(
child = newChildren.head,
delimiter = newChildren(1),
orderExpressions = newChildren
.drop(2)
.map(_.asInstanceOf[SortOrder])
newChildren.head,
newChildren(1),
orderChildExpressions = sortOrderExpressions.map(_.child),
orderDirections = sortOrderExpressions.map(_.direction),
orderNullOrderings = sortOrderExpressions.map(_.nullOrdering),
orderSameOrderExpressions = sortOrderExpressions.map(_.sameOrderExpressions)
)
}

private[this] def orderValuesField: Seq[StructField] = {
orderExpressions.zipWithIndex.map {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -258,12 +258,18 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] {
def patchAggregateFunctionChildren(
af: AggregateFunction)(
attrs: Expression => Option[Expression]): AggregateFunction = {
val newChildren = af.children.map(c => attrs(c).getOrElse(c))
val newChildren = af.children.map {
case so: SortOrder =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is SortOrder the only expression that should propagate into child's child?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the only one that I'm aware of, with respect to aggregate patching. Note that the issue originally came up with ListAgg - which is a bit of a specific aggregate expression. I don't think that we're aware of any other similar issues at this time, but please feel free to share any additional info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm confused. SortOrder is no longer child of ListAgg, how can we hit it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SortOrder is no longer child of ListAgg, how can we hit it here?

That's because of this:

override def children: Seq[Expression] = child +: delimiter +: orderExpressions

Otherwise, it would be more difficult to pack up all of the optional / dynamic order expressions (that themselves include Seq[Expression] and even Seq[Seq[Expression]]).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we generate SortOrder and then put it in the children? Can we just put orderChildExpressions in the children?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally children should be selected from the constructor parameters, not something generated on the fly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally children should be selected from the constructor parameters, not something generated on the fly.

Agreed. However, I don't think that's viable with this particular expression (please see below for more details).

Can we just put orderChildExpressions in the children?

I don't think that orderChildExpressions is enough, we need full sortorder information in order to be able to reconstruct the original aggregate - this can be either in a SortOrder object (as is currently implemented) or a sort of struct containing all of the relevant parameters (which would essentially be SortOrder, but with extra steps). Otherwise, packing up all optional / dynamic child expressions expressions in a children: Seq[Expression] would be very difficult.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to the code where we collect the full sort order information from children? Why can't we match the ListAgg expression and collect whatever info we need?

so.copy(child = attrs(so.child).getOrElse(so.child))
case c =>
attrs(c).getOrElse(c)
}
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
}

// Setup unique distinct aggregate children.
val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq.distinct
val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to do? Does any other aggregate expression use this? Could you please confirm with a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly are you referring to? If you mean filtering out SortOrder here, as far as I'm aware - the only catalyst expression that uses it is ListAgg, and tests are added for this expression (including some tests that combine listagg with other aggregates to verify the correct behaviour).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way of adding a test for this rule, so that if new expression was added with SortOrder, we actually alert with a proper error? Just want to make sure we do not get into the same state in future, where we have bugs because of optimization.

.filter(!_.isInstanceOf[SortOrder]).distinct
val distinctAggChildAttrMap = distinctAggChildren.map { e =>
e.canonicalized -> AttributeReference(e.sql, e.dataType, nullable = true)()
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
-- !query
SELECT listagg(c1) WITHIN GROUP (ORDER BY c1 COLLATE utf8_binary) FROM (VALUES ('a'), ('A'), ('b'), ('B')) AS t(c1)
-- !query analysis
Aggregate [listagg(c1#x, null, collate(c1#x, utf8_binary) ASC NULLS FIRST, 0, 0) AS listagg(c1, NULL) WITHIN GROUP (ORDER BY collate(c1, utf8_binary) ASC NULLS FIRST)#x]
Aggregate [listagg(c1#x, null, collate(c1#x, utf8_binary), Ascending, NullsFirst, Vector(), 0, 0) AS listagg(c1, NULL) WITHIN GROUP (ORDER BY collate(c1, utf8_binary) ASC NULLS FIRST)#x]
+- SubqueryAlias t
+- Project [col1#x AS c1#x]
+- LocalRelation [col1#x]
Expand Down Expand Up @@ -30,7 +30,7 @@ WithCTE
:- CTERelationDef xxxx, false
: +- SubqueryAlias t
: +- Project [listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x AS c1#x]
: +- Aggregate [listagg(col1#x, null, col1#x ASC NULLS FIRST, 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x]
: +- Aggregate [listagg(col1#x, null, col1#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x]
: +- SubqueryAlias __auto_generated_subquery_name
: +- LocalRelation [col1#x]
+- Project [replace(replace(c1#x, , ),
Expand All @@ -43,7 +43,7 @@ WithCTE
-- !query
SELECT lower(listagg(c1) WITHIN GROUP (ORDER BY c1 COLLATE utf8_lcase)) FROM (VALUES ('a'), ('A'), ('b'), ('B')) AS t(c1)
-- !query analysis
Aggregate [lower(listagg(c1#x, null, collate(c1#x, utf8_lcase) ASC NULLS FIRST, 0, 0)) AS lower(listagg(c1, NULL) WITHIN GROUP (ORDER BY collate(c1, utf8_lcase) ASC NULLS FIRST))#x]
Aggregate [lower(listagg(c1#x, null, collate(c1#x, utf8_lcase), Ascending, NullsFirst, Vector(), 0, 0)) AS lower(listagg(c1, NULL) WITHIN GROUP (ORDER BY collate(c1, utf8_lcase) ASC NULLS FIRST))#x]
+- SubqueryAlias t
+- Project [col1#x AS c1#x]
+- LocalRelation [col1#x]
Expand All @@ -67,7 +67,7 @@ WithCTE
-- !query
SELECT lower(listagg(DISTINCT c1 COLLATE utf8_lcase) WITHIN GROUP (ORDER BY c1 COLLATE utf8_lcase)) FROM (VALUES ('a'), ('B'), ('b'), ('A')) AS t(c1)
-- !query analysis
Aggregate [lower(listagg(distinct collate(c1#x, utf8_lcase), null, collate(c1#x, utf8_lcase) ASC NULLS FIRST, 0, 0)) AS lower(listagg(DISTINCT collate(c1, utf8_lcase), NULL) WITHIN GROUP (ORDER BY collate(c1, utf8_lcase) ASC NULLS FIRST))#x]
Aggregate [lower(listagg(distinct collate(c1#x, utf8_lcase), null, collate(c1#x, utf8_lcase), Ascending, NullsFirst, Vector(), 0, 0)) AS lower(listagg(DISTINCT collate(c1, utf8_lcase), NULL) WITHIN GROUP (ORDER BY collate(c1, utf8_lcase) ASC NULLS FIRST))#x]
+- SubqueryAlias t
+- Project [col1#x AS c1#x]
+- LocalRelation [col1#x]
Expand Down Expand Up @@ -95,7 +95,7 @@ WithCTE
:- CTERelationDef xxxx, false
: +- SubqueryAlias t
: +- Project [listagg(col1, NULL) WITHIN GROUP (ORDER BY collate(col1, unicode_rtrim) ASC NULLS FIRST)#x AS c1#x]
: +- Aggregate [listagg(col1#x, null, collate(col1#x, unicode_rtrim) ASC NULLS FIRST, 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY collate(col1, unicode_rtrim) ASC NULLS FIRST)#x]
: +- Aggregate [listagg(col1#x, null, collate(col1#x, unicode_rtrim), Ascending, NullsFirst, Vector(), 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY collate(col1, unicode_rtrim) ASC NULLS FIRST)#x]
: +- SubqueryAlias __auto_generated_subquery_name
: +- LocalRelation [col1#x]
+- Project [replace(replace(c1#x, , ),
Expand Down
123 changes: 114 additions & 9 deletions sql/core/src/test/resources/sql-tests/analyzer-results/listagg.sql.out
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ WithCTE
-- !query
SELECT listagg(col1) WITHIN GROUP (ORDER BY col1) FROM df
-- !query analysis
Aggregate [listagg(col1#x, null, col1#x ASC NULLS FIRST, 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x]
Aggregate [listagg(col1#x, null, col1#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
Expand All @@ -161,7 +161,7 @@ Aggregate [listagg(col1#x, null, col1#x ASC NULLS FIRST, 0, 0) AS listagg(col1,
-- !query
SELECT listagg(col1) WITHIN GROUP (ORDER BY col1 DESC) FROM df
-- !query analysis
Aggregate [listagg(col1#x, null, col1#x DESC NULLS LAST, 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 DESC NULLS LAST)#x]
Aggregate [listagg(col1#x, null, col1#x, Descending, NullsLast, Vector(), 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 DESC NULLS LAST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
Expand All @@ -175,7 +175,7 @@ SELECT listagg(col1) WITHIN GROUP (ORDER BY col1 DESC) OVER (PARTITION BY col2)
-- !query analysis
Project [listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 DESC NULLS LAST) OVER (PARTITION BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#x]
+- Project [col1#x, col2#x, listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 DESC NULLS LAST) OVER (PARTITION BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#x, listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 DESC NULLS LAST) OVER (PARTITION BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#x]
+- Window [listagg(col1#x, null, col1#x DESC NULLS LAST, 0, 0) windowspecdefinition(col2#x, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 DESC NULLS LAST) OVER (PARTITION BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#x], [col2#x]
+- Window [listagg(col1#x, null, col1#x, Descending, NullsLast, Vector(), 0, 0) windowspecdefinition(col2#x, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 DESC NULLS LAST) OVER (PARTITION BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#x], [col2#x]
+- Project [col1#x, col2#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
Expand All @@ -188,7 +188,7 @@ Project [listagg(col1, NULL) WITHIN GROUP (ORDER BY col1 DESC NULLS LAST) OVER (
-- !query
SELECT listagg(col1) WITHIN GROUP (ORDER BY col2) FROM df
-- !query analysis
Aggregate [listagg(col1#x, null, col2#x ASC NULLS FIRST, 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 ASC NULLS FIRST)#x]
Aggregate [listagg(col1#x, null, col2#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 ASC NULLS FIRST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
Expand All @@ -204,7 +204,7 @@ WithCTE
:- CTERelationDef xxxx, false
: +- SubqueryAlias t
: +- Project [listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x AS col#x]
: +- Aggregate [listagg(col1#x, null, col2#x DESC NULLS LAST, 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
: +- Aggregate [listagg(col1#x, null, col2#x, Descending, NullsLast, Vector(), 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
: +- SubqueryAlias df
: +- View (`df`, [col1#x, col2#x])
: +- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
Expand All @@ -223,7 +223,7 @@ WithCTE
:- CTERelationDef xxxx, false
: +- SubqueryAlias t
: +- Project [listagg(col1, |) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x AS col#x]
: +- Aggregate [listagg(col1#x, |, col2#x DESC NULLS LAST, 0, 0) AS listagg(col1, |) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
: +- Aggregate [listagg(col1#x, |, col2#x, Descending, NullsLast, Vector(), 0, 0) AS listagg(col1, |) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
: +- SubqueryAlias df
: +- View (`df`, [col1#x, col2#x])
: +- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
Expand All @@ -238,7 +238,7 @@ WithCTE
-- !query
SELECT listagg(col1, '|') WITHIN GROUP (ORDER BY col2 DESC) FROM df
-- !query analysis
Aggregate [listagg(col1#x, |, col2#x DESC NULLS LAST, 0, 0) AS listagg(col1, |) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
Aggregate [listagg(col1#x, |, col2#x, Descending, NullsLast, Vector(), 0, 0) AS listagg(col1, |) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
Expand All @@ -250,7 +250,7 @@ Aggregate [listagg(col1#x, |, col2#x DESC NULLS LAST, 0, 0) AS listagg(col1, |)
-- !query
SELECT listagg(col1) WITHIN GROUP (ORDER BY col2 DESC, col1 ASC) FROM df
-- !query analysis
Aggregate [listagg(col1#x, null, col2#x DESC NULLS LAST, col1#x ASC NULLS FIRST, 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST, col1 ASC NULLS FIRST)#x]
Aggregate [listagg(col1#x, null, col2#x, col1#x, Descending, Ascending, NullsLast, NullsFirst, Vector(), Vector(), 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST, col1 ASC NULLS FIRST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
Expand All @@ -262,7 +262,7 @@ Aggregate [listagg(col1#x, null, col2#x DESC NULLS LAST, col1#x ASC NULLS FIRST,
-- !query
SELECT listagg(col1) WITHIN GROUP (ORDER BY col2 DESC, col1 DESC) FROM df
-- !query analysis
Aggregate [listagg(col1#x, null, col2#x DESC NULLS LAST, col1#x DESC NULLS LAST, 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST, col1 DESC NULLS LAST)#x]
Aggregate [listagg(col1#x, null, col2#x, col1#x, Descending, Descending, NullsLast, NullsLast, Vector(), Vector(), 0, 0) AS listagg(col1, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST, col1 DESC NULLS LAST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
Expand Down Expand Up @@ -335,6 +335,111 @@ WithCTE
+- CTERelationRef xxxx, true, [col1#x, col2#x], false, false, 1


-- !query
SELECT
LISTAGG(DISTINCT col1) WITHIN GROUP (ORDER BY col1),
LISTAGG(DISTINCT col2) WITHIN GROUP (ORDER BY col2)
FROM df
-- !query analysis
Aggregate [listagg(distinct col1#x, null, col1#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(DISTINCT col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x, listagg(distinct col2#x, null, col2#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(DISTINCT col2, NULL) WITHIN GROUP (ORDER BY col2 ASC NULLS FIRST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
+- Project [col1#x, col2#x]
+- SubqueryAlias __auto_generated_subquery_name
+- LocalRelation [col1#x, col2#x]


-- !query
SELECT
LISTAGG(DISTINCT col1) WITHIN GROUP (ORDER BY col1),
LISTAGG(DISTINCT col2, '|') WITHIN GROUP (ORDER BY col2)
FROM df
-- !query analysis
Aggregate [listagg(distinct col1#x, null, col1#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(DISTINCT col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x, listagg(distinct col2#x, |, col2#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(DISTINCT col2, |) WITHIN GROUP (ORDER BY col2 ASC NULLS FIRST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
+- Project [col1#x, col2#x]
+- SubqueryAlias __auto_generated_subquery_name
+- LocalRelation [col1#x, col2#x]


-- !query
SELECT
LISTAGG(DISTINCT col1) WITHIN GROUP (ORDER BY col1),
LISTAGG(DISTINCT col2) WITHIN GROUP (ORDER BY col2 DESC)
FROM df
-- !query analysis
Aggregate [listagg(distinct col1#x, null, col1#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(DISTINCT col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x, listagg(distinct col2#x, null, col2#x, Descending, NullsLast, Vector(), 0, 0) AS listagg(DISTINCT col2, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
+- Project [col1#x, col2#x]
+- SubqueryAlias __auto_generated_subquery_name
+- LocalRelation [col1#x, col2#x]


-- !query
SELECT
LISTAGG(DISTINCT col1) WITHIN GROUP (ORDER BY col1),
LISTAGG(DISTINCT col2) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)
FROM df
-- !query analysis
Aggregate [listagg(distinct col1#x, null, col1#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(DISTINCT col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x, listagg(distinct col2#x, null, col2#x, Descending, NullsLast, Vector(), 0, 0) AS listagg(DISTINCT col2, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
+- Project [col1#x, col2#x]
+- SubqueryAlias __auto_generated_subquery_name
+- LocalRelation [col1#x, col2#x]


-- !query
SELECT
LISTAGG(DISTINCT col1) WITHIN GROUP (ORDER BY col1),
LISTAGG(DISTINCT col2) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)
FROM df WHERE col1 > 'a' AND col2 > 'b'
-- !query analysis
Aggregate [listagg(distinct col1#x, null, col1#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(DISTINCT col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x, listagg(distinct col2#x, null, col2#x, Descending, NullsLast, Vector(), 0, 0) AS listagg(DISTINCT col2, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x]
+- Filter ((col1#x > a) AND (col2#x > b))
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
+- Project [col1#x, col2#x]
+- SubqueryAlias __auto_generated_subquery_name
+- LocalRelation [col1#x, col2#x]


-- !query
SELECT
LISTAGG(DISTINCT col1) WITHIN GROUP (ORDER BY col1),
COUNT(DISTINCT col2)
FROM df
-- !query analysis
Aggregate [listagg(distinct col1#x, null, col1#x, Ascending, NullsFirst, Vector(), 0, 0) AS listagg(DISTINCT col1, NULL) WITHIN GROUP (ORDER BY col1 ASC NULLS FIRST)#x, count(distinct col2#x) AS count(DISTINCT col2)#xL]
+- SubqueryAlias df
+- View (`df`, [col1#x, col2#x])
+- Project [cast(col1#x as string) AS col1#x, cast(col2#x as string) AS col2#x]
+- Project [col1#x, col2#x]
+- SubqueryAlias __auto_generated_subquery_name
+- LocalRelation [col1#x, col2#x]


-- !query
SELECT
col1,
LISTAGG(DISTINCT col2) WITHIN GROUP (ORDER BY col2 DESC),
COUNT(DISTINCT col3)
FROM (VALUES ('A', 'x', '1'), ('A', 'y', '2'), ('B', 'y', '2'), ('B', 'z', '3')) AS tbl(col1, col2, col3)
GROUP BY col1
-- !query analysis
Aggregate [col1#x], [col1#x, listagg(distinct col2#x, null, col2#x, Descending, NullsLast, Vector(), 0, 0) AS listagg(DISTINCT col2, NULL) WITHIN GROUP (ORDER BY col2 DESC NULLS LAST)#x, count(distinct col3#x) AS count(DISTINCT col3)#xL]
+- SubqueryAlias tbl
+- Project [col1#x AS col1#x, col2#x AS col2#x, col3#x AS col3#x]
+- LocalRelation [col1#x, col2#x, col3#x]


-- !query
SELECT listagg(c1) FROM (VALUES (ARRAY('a', 'b'))) AS t(c1)
-- !query analysis
Expand Down
Loading