Import Model from Apache Spark
This is a legacy Apache Ignite documentationThe new documentation is hosted here: https://ignite.apache.org/docs/latest/
Model import from Apache Spark via Parquet files
Starting with Ignite 2.8, it's possible to import the following models of Apache Spark ML:
- Logistic regression (
org.apache.spark.ml.classification.LogisticRegressionModel) - Linear regression (
org.apache.spark.ml.classification.LogisticRegressionModel) - Decision tree (
org.apache.spark.ml.classification.DecisionTreeClassificationModel) - Support Vector Machine (
org.apache.spark.ml.classification.LinearSVCModel) - Random forest (
org.apache.spark.ml.classification.RandomForestClassificationModel) - K-Means (
org.apache.spark.ml.clustering.KMeansModel) - Decision tree regression (
org.apache.spark.ml.regression.DecisionTreeRegressionModel) - Random forest regression (
org.apache.spark.ml.regression.RandomForestRegressionModel) - Gradient boosted trees regression (
org.apache.spark.ml.regression.GBTRegressionModel) - Gradient boosted trees (
org.apache.spark.ml.classification.GBTClassificationModel)
This feature works with models saved in snappy.parquet files.
Supported and tested Spark version: 2.3.0
Possibly might work with next Spark versions: 2.1, 2.2, 2.3, 2.4
To get the model from Spark ML you should save the model built as a result of training in Spark ML to the parquet file like in example below:
val spark: SparkSession = TitanicUtils.getSparkSession
val passengers = TitanicUtils.readPassengersWithCasting(spark)
.select("survived", "pclass", "sibsp", "parch", "sex", "embarked", "age")
// Step - 1: Make Vectors from dataframe's columns using special VectorAssmebler
val assembler = new VectorAssembler()
.setInputCols(Array("pclass", "sibsp", "parch", "survived"))
.setOutputCol("features")
// Step - 2: Transform dataframe to vectorized dataframe with dropping rows
val output = assembler.transform(
passengers.na.drop(Array("pclass", "sibsp", "parch", "survived", "age"))
).select("features", "age")
val lr = new LinearRegression()
.setMaxIter(100)
.setRegParam(0.1)
.setElasticNetParam(0.1)
.setLabelCol("age")
.setFeaturesCol("features")
// Fit the model
val model = lr.fit(output)
model.write.overwrite().save("/home/models/titanic/linreg")To load in Ignite ML you should use SparkModelParser class via method parse() call
DecisionTreeNode mdl = (DecisionTreeNode)SparkModelParser.parse(
SPARK_MDL_PATH,
SupportedSparkModels.DECISION_TREE
);You can see more examples of using this API in the examples module in the package: org.apache.ignite.examples.ml.inference.spark.modelparser
NOTEIt does not support loading from PipelineModel in Spark.
It does not support intermediate feature transformers from Spark due to different nature of preprocessing on Ignite and Spark side.
Updated 9 months ago
