-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52439][SQL] Support creating check constraint with NULL #51146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cc @aokolnychyi as well |
@@ -78,6 +78,7 @@ class V2ExpressionBuilder(e: Expression, isPredicate: Boolean = false) extends L | |||
expr: Expression, isPredicate: Boolean = false): Option[V2Expression] = expr match { | |||
case Literal(true, BooleanType) => Some(new AlwaysTrue()) | |||
case Literal(false, BooleanType) => Some(new AlwaysFalse()) | |||
case Cast(Literal(null, NullType), BooleanType, _, _) if isPredicate => Some(new AlwaysNull()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it not constant folded to Literal(null, BooleanType)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens in analyzer rule ResolveTableSpec
.
If we proactively constant folding it, the current-like expressions won't be resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, doing the conversion to V2 predicate in ResolveTableSpec
overall means we will not be able to optimize the conditions before the conversion... This is unlucky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually quite problematic, the same applies to default values. We should optimize/fold some of these expressions in the optimizer before converting to DSv2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the v2 Predicate
is a big mistake... We never know if an expression is a predicate or not statically. In fact, we only require boolean-type expressions in catalyst to be used as predicates.
In this case, a null literal can be a predicate, a CAST to boolean is a predicate, but they do not extend the v2 Predicate
class and we can't make the code compile.
It's too late to remove v2 Predicate
as the ship is already sailed, but we can silently kill it by adding a new BooleanExpression
v2 expression that extends Predicate
and simply wraps any v2 expressions that returns boolean type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at #51247
* @since 4.1.0 | ||
*/ | ||
@Evolving | ||
public final class AlwaysNull extends Predicate implements Literal<Boolean> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to add it? can v2 Literal
do the work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be a V2 Predicate:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala#L58
Also, having AlwaysNull besides AlwaysTrue
and AlwaysFalse
is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what about the return type, however? Why boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that it is a predicate, hence boolean. Let me think a bit more about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally don't mind adding AlwaysNull but I am not sure there is value in allowing CHECK (null), to be honest. I know some databases accept it but it seems useless. Are there databases that reject or ignore these? Or most databases support them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Databases supports CHECK(null)
:
- PostgreSQL
- Oracle
Databases doesn't support CHECK(null)
:
- MySQL:
ERROR 3813 (HY000) at line 2: Column check constraint 'product_chk_1' references other column.
- SQL server: The bare token NULL has no boolean meaning:
An expression of non-boolean type specified in a context where a condition is expected
import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation | ||
|
||
class ResolveTableConstraints(val catalogManager: CatalogManager) extends Rule[LogicalPlan] { | ||
|
||
override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsWithPruning( | ||
_.containsPattern(COMMAND), ruleId) { | ||
case r: RowLevelWrite if r.operation.command() == RowLevelOperation.Command.DELETE => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please check what Delta Lake does in DELETE operations if DVs are disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this part of code to #51251
|
||
@Override | ||
public DataType dataType() { | ||
return DataTypes.BooleanType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting in this file is off. It should use 2 spaces on new lines.
What changes were proposed in this pull request?
Why are the changes needed?
Before the changes, there will be complation error if creating a check constraint with null
check(null)
. This PR fixes the issue.Does this PR introduce any user-facing change?
No
How was this patch tested?
New UT
Was this patch authored or co-authored using generative AI tooling?
No