Skip to content

[SPARK-52439][SQL] Support creating check constraint with NULL #51146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Jun 10, 2025

What changes were proposed in this pull request?

  • Support check constraint with NULL value
  • Add test cases for constraints with constant expressions.

Why are the changes needed?

Before the changes, there will be complation error if creating a check constraint with null check(null). This PR fixes the issue.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New UT

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jun 10, 2025
@gengliangwang gengliangwang requested a review from cloud-fan June 10, 2025 23:07
@gengliangwang
Copy link
Member Author

cc @aokolnychyi as well

@gengliangwang gengliangwang changed the title [SPARK-52439][SQL] Support check constraint with null value [SPARK-52439][SQL] Support creating check constraint with NULL Jun 10, 2025
@@ -78,6 +78,7 @@ class V2ExpressionBuilder(e: Expression, isPredicate: Boolean = false) extends L
expr: Expression, isPredicate: Boolean = false): Option[V2Expression] = expr match {
case Literal(true, BooleanType) => Some(new AlwaysTrue())
case Literal(false, BooleanType) => Some(new AlwaysFalse())
case Cast(Literal(null, NullType), BooleanType, _, _) if isPredicate => Some(new AlwaysNull())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it not constant folded to Literal(null, BooleanType)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens in analyzer rule ResolveTableSpec.
If we proactively constant folding it, the current-like expressions won't be resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, doing the conversion to V2 predicate in ResolveTableSpec overall means we will not be able to optimize the conditions before the conversion... This is unlucky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually quite problematic, the same applies to default values. We should optimize/fold some of these expressions in the optimizer before converting to DSv2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the v2 Predicate is a big mistake... We never know if an expression is a predicate or not statically. In fact, we only require boolean-type expressions in catalyst to be used as predicates.

In this case, a null literal can be a predicate, a CAST to boolean is a predicate, but they do not extend the v2 Predicate class and we can't make the code compile.

It's too late to remove v2 Predicate as the ship is already sailed, but we can silently kill it by adding a new BooleanExpression v2 expression that extends Predicate and simply wraps any v2 expressions that returns boolean type.

Copy link
Contributor

@cloud-fan cloud-fan Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at #51247

* @since 4.1.0
*/
@Evolving
public final class AlwaysNull extends Predicate implements Literal<Boolean> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to add it? can v2 Literal do the work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be a V2 Predicate:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala#L58

Also, having AlwaysNull besides AlwaysTrue and AlwaysFalse is reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what about the return type, however? Why boolean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that it is a predicate, hence boolean. Let me think a bit more about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally don't mind adding AlwaysNull but I am not sure there is value in allowing CHECK (null), to be honest. I know some databases accept it but it seems useless. Are there databases that reject or ignore these? Or most databases support them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Databases supports CHECK(null):

  • PostgreSQL
  • Oracle

Databases doesn't support CHECK(null):

  • MySQL: ERROR 3813 (HY000) at line 2: Column check constraint 'product_chk_1' references other column.
  • SQL server: The bare token NULL has no boolean meaning: An expression of non-boolean type specified in a context where a condition is expected

import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation

class ResolveTableConstraints(val catalogManager: CatalogManager) extends Rule[LogicalPlan] {

override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsWithPruning(
_.containsPattern(COMMAND), ruleId) {
case r: RowLevelWrite if r.operation.command() == RowLevelOperation.Command.DELETE =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please check what Delta Lake does in DELETE operations if DVs are disabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this part of code to #51251


@Override
public DataType dataType() {
return DataTypes.BooleanType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting in this file is off. It should use 2 spaces on new lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants