[SPARK-43061][CORE][SQL] Introduce PartitionEvaluator for SQL operator execution

cloud-fan · cloud-fan · commit cabba33cb277 · 2023-04-11T00:08:27.000+08:00
### What changes were proposed in this pull request? This PR adds a new API `PartitionEvaluator` to define the computing logic and requires the caller side to explicitly list what needs to be serialized and sent to executors via `PartitionEvaluatorFactory`. Two new RDD APIs are added to use `PartitionEvaluator`: ``` /** * Return a new RDD by applying an evaluator to each partition of this RDD. The given evaluator * factory will be serialized and sent to executors, and each task will create an evaluator with * the factory, and use the evaluator to transform the data of the input partition. */ DeveloperApi Since("3.5.0") def mapPartitionsWithEvaluator[U: ClassTag]( partitionEvaluatorFactory: PartitionEvaluatorFactory[T, U]): RDD[U] = withScope { new MapPartitionsWithEvaluatorRDD(this, taskEvaluatorFactory) } /** * Zip this RDD's partitions with another RDD and return a new RDD by applying an evaluator to * the zipped partitions. Assumes that the two RDDs have the *same number of partitions*, but * does *not* require them to have the same number of elements in each partition. */ DeveloperApi Since("3.5.0") def zipPartitionsWithEvaluator[U: ClassTag]( rdd2: RDD[T], partitionEvaluatorFactory: PartitionEvaluatorFactory[T, U]): RDD[U] = withScope { new ZippedPartitionsWithEvaluatorRDD(this, rdd2, partitionEvaluatorFactory) } ``` Three SQL operators are updated to use the new API to do execution, as a showcase: Project, Filter, WholeStageCodegen. We can migrate more operators later. A config is added to still use the old code path by default. ### Why are the changes needed? Using lambda to define the computing logic is a bit tricky: 1. it's easy to mistakenly reference objects in the closure, which increases the time to serialize the lambda and sent to executors. `ProjectExec` and `FilterExec` use `child.output` in the lambda which means the entire `child` will be serialized. There are other places trying to avoid this problem, e.g. https://github.com/apache/spark/blob/v3.3.2/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L90-L92 2. serializing lambda is strongly discouraged by the [official Java guide](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html#serialization). We should eventually get rid of lambda during distributed execution to make Spark more robust. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes apache#40697 from cloud-fan/serde. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
diff --git a/core/src/main/scala/org/apache/spark/PartitionEvaluator.scala b/core/src/main/scala/org/apache/spark/PartitionEvaluator.scala
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import org.apache.spark.annotation.{DeveloperApi, Since}
+
+/**
+ * An evaluator for computing RDD partitions. Spark serializes and sends
+ * [[PartitionEvaluatorFactory]] to executors, and then creates [[PartitionEvaluator]] via the
+ * factory at the executor side.
+ */
+@DeveloperApi
+@Since("3.5.0")
+trait PartitionEvaluator[T, U] {
+
+  /**
+   * Evaluates the RDD partition at the given index. There can be more than one input iterator,
+   * if the RDD was zipped from multiple RDDs.
+   */
+  def eval(partitionIndex: Int, inputs: Iterator[T]*): Iterator[U]
+}
diff --git a/core/src/main/scala/org/apache/spark/PartitionEvaluatorFactory.scala b/core/src/main/scala/org/apache/spark/PartitionEvaluatorFactory.scala
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.io.Serializable
+
+import org.apache.spark.annotation.{DeveloperApi, Since}
+
+/**
+ * A factory to create [[PartitionEvaluator]]. Spark serializes and sends
+ * [[PartitionEvaluatorFactory]] to executors, and then creates [[PartitionEvaluator]] via the
+ * factory at the executor side.
+ */
+@DeveloperApi
+@Since("3.5.0")
+trait PartitionEvaluatorFactory[T, U] extends Serializable {
+
+  /**
+   * Creates a partition evaluator. Each RDD partition will create one evaluator instance, which
+   * means one evaluator instance will be used by only one thread.
+   */
+  def createEvaluator(): PartitionEvaluator[T, U]
+}
diff --git a/core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithEvaluatorRDD.scala b/core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithEvaluatorRDD.scala
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rdd
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.{Partition, PartitionEvaluatorFactory, TaskContext}
+
+private[spark] class MapPartitionsWithEvaluatorRDD[T : ClassTag, U : ClassTag](
+    var prev: RDD[T],
+    evaluatorFactory: PartitionEvaluatorFactory[T, U])
+  extends RDD[U](prev) {
+
+  override def getPartitions: Array[Partition] = firstParent[T].partitions
+
+  override def compute(split: Partition, context: TaskContext): Iterator[U] = {
+    val evaluator = evaluatorFactory.createEvaluator()
+    val input = firstParent[T].iterator(split, context)
+    evaluator.eval(split.index, input)
+  }
+
+  override def clearDependencies(): Unit = {
+    super.clearDependencies()
+    prev = null
+  }
+}
diff --git a/core/src/main/scala/org/apache/spark/rdd/RDD.scala b/core/src/main/scala/org/apache/spark/rdd/RDD.scala
@@ -908,6 +908,31 @@ abstract class RDD[T: ClassTag](
       preservesPartitioning)
   }
 
+  /**
+   * Return a new RDD by applying an evaluator to each partition of this RDD. The given evaluator
+   * factory will be serialized and sent to executors, and each task will create an evaluator with
+   * the factory, and use the evaluator to transform the data of the input partition.
+   */
+  @DeveloperApi
+  @Since("3.5.0")
+  def mapPartitionsWithEvaluator[U: ClassTag](
+      evaluatorFactory: PartitionEvaluatorFactory[T, U]): RDD[U] = withScope {
+    new MapPartitionsWithEvaluatorRDD(this, evaluatorFactory)
+  }
+
+  /**
+   * Zip this RDD's partitions with another RDD and return a new RDD by applying an evaluator to
+   * the zipped partitions. Assumes that the two RDDs have the *same number of partitions*, but
+   * does *not* require them to have the same number of elements in each partition.
+   */
+  @DeveloperApi
+  @Since("3.5.0")
+  def zipPartitionsWithEvaluator[U: ClassTag](
+      rdd2: RDD[T],
+      evaluatorFactory: PartitionEvaluatorFactory[T, U]): RDD[U] = withScope {
+    new ZippedPartitionsWithEvaluatorRDD(this, rdd2, evaluatorFactory)
+  }
+
   /**
    * Return a new RDD by applying a function to each partition of this RDD, while tracking the index
    * of the original partition.
diff --git a/core/src/main/scala/org/apache/spark/rdd/ZippedPartitionsWithEvaluatorRDD.scala b/core/src/main/scala/org/apache/spark/rdd/ZippedPartitionsWithEvaluatorRDD.scala
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rdd
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.{Partition, PartitionEvaluatorFactory, TaskContext}
+
+private[spark] class ZippedPartitionsWithEvaluatorRDD[T : ClassTag, U : ClassTag](
+    var rdd1: RDD[T],
+    var rdd2: RDD[T],
+    evaluatorFactory: PartitionEvaluatorFactory[T, U])
+  extends ZippedPartitionsBaseRDD[U](rdd1.context, List(rdd1, rdd2)) {
+
+  override def compute(split: Partition, context: TaskContext): Iterator[U] = {
+    val evaluator = evaluatorFactory.createEvaluator()
+    val partitions = split.asInstanceOf[ZippedPartitionsPartition].partitions
+    evaluator.eval(
+      split.index,
+      rdd1.iterator(partitions(0), context),
+      rdd2.iterator(partitions(1), context))
+  }
+}
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1808,6 +1808,13 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val USE_PARTITION_EVALUATOR = buildConf("spark.sql.execution.usePartitionEvaluator")
+    .internal()
+    .doc("When true, use PartitionEvaluator to execute SQL operators.")
+    .version("3.5.0")
+    .booleanConf
+    .createWithDefault(false)
+
   val STATE_STORE_PROVIDER_CLASS =
     buildConf("spark.sql.streaming.stateStore.providerClass")
       .internal()
@@ -5028,6 +5035,8 @@ class SQLConf extends Serializable with Logging {
   def allowsTempViewCreationWithMultipleNameparts: Boolean =
     getConf(SQLConf.ALLOW_TEMP_VIEW_CREATION_WITH_MULTIPLE_NAME_PARTS)
 
+  def usePartitionEvaluator: Boolean = getConf(SQLConf.USE_PARTITION_EVALUATOR)
+
   /** ********************** SQLConf functionality methods ************ */
 
   /** Set Spark SQL configuration properties. */
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/FilterEvaluatorFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/FilterEvaluatorFactory.scala
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.{PartitionEvaluator, PartitionEvaluatorFactory}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression, Predicate}
+import org.apache.spark.sql.execution.metric.SQLMetric
+
+class FilterEvaluatorFactory(
+    condition: Expression,
+    childOutput: Seq[Attribute],
+    numOutputRows: SQLMetric) extends PartitionEvaluatorFactory[InternalRow, InternalRow] {
+
+  override def createEvaluator(): PartitionEvaluator[InternalRow, InternalRow] = {
+    new FilterPartitionEvaluator
+  }
+
+  class FilterPartitionEvaluator extends PartitionEvaluator[InternalRow, InternalRow] {
+    override def eval(
+        partitionIndex: Int,
+        inputs: Iterator[InternalRow]*): Iterator[InternalRow] = {
+      assert(inputs.length == 1)
+      val predicate = Predicate.create(condition, childOutput)
+      predicate.initialize(partitionIndex)
+      inputs.head.filter { row =>
+        val r = predicate.eval(row)
+        if (r) numOutputRows += 1
+        r
+      }
+    }
+  }
+}
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectEvaluatorFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ProjectEvaluatorFactory.scala
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.{PartitionEvaluator, PartitionEvaluatorFactory}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression, UnsafeProjection}
+
+class ProjectEvaluatorFactory(projectList: Seq[Expression], childOutput: Seq[Attribute])
+  extends PartitionEvaluatorFactory[InternalRow, InternalRow] {
+
+  override def createEvaluator(): PartitionEvaluator[InternalRow, InternalRow] = {
+    new ProjectPartitionEvaluator
+  }
+
+  class ProjectPartitionEvaluator extends PartitionEvaluator[InternalRow, InternalRow] {
+    override def eval(
+        partitionIndex: Int,
+        inputs: Iterator[InternalRow]*): Iterator[InternalRow] = {
+      assert(inputs.length == 1)
+      val project = UnsafeProjection.create(projectList, childOutput)
+      project.initialize(partitionIndex)
+      inputs.head.map(project)
+    }
+  }
+}
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenEvaluatorFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenEvaluatorFactory.scala
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.{PartitionEvaluator, PartitionEvaluatorFactory}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodeAndComment, CodeGenerator}
+import org.apache.spark.sql.execution.metric.SQLMetric
+
+class WholeStageCodegenEvaluatorFactory(
+    cleanedSource: CodeAndComment,
+    durationMs: SQLMetric,
+    references: Array[Any]) extends PartitionEvaluatorFactory[InternalRow, InternalRow] {
+
+  override def createEvaluator(): PartitionEvaluator[InternalRow, InternalRow] = {
+    new WholeStageCodegenPartitionEvaluator()
+  }
+
+  class WholeStageCodegenPartitionEvaluator extends PartitionEvaluator[InternalRow, InternalRow] {
+    override def eval(
+        partitionIndex: Int,
+        inputs: Iterator[InternalRow]*): Iterator[InternalRow] = {
+      val (clazz, _) = CodeGenerator.compile(cleanedSource)
+      val buffer = clazz.generate(references).asInstanceOf[BufferedRowIterator]
+      buffer.init(partitionIndex, inputs.toArray)
+      new Iterator[InternalRow] {
+        override def hasNext: Boolean = {
+          val v = buffer.hasNext
+          if (!v) durationMs += buffer.durationMs()
+          v
+        }
+        override def next: InternalRow = buffer.next()
+      }
+    }
+  }
+}
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala