Skip to content

Commit cfff397

Browse files
viiryajkbradley
authored andcommitted
[SPARK-6004][MLlib] Pick the best model when training GradientBoostedTrees with validation
Since the validation error does not change monotonically, in practice, it should be proper to pick the best model when training GradientBoostedTrees with validation instead of stopping it early. Author: Liang-Chi Hsieh <[email protected]> Closes apache#4763 from viirya/gbt_record_model and squashes the following commits: 452e049 [Liang-Chi Hsieh] Address comment. ea2fae2 [Liang-Chi Hsieh] Pick the best model when training GradientBoostedTrees with validation.
1 parent 2358657 commit cfff397

File tree

1 file changed

+9
-3
lines changed

1 file changed

+9
-3
lines changed

mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoostedTrees.scala

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -251,9 +251,15 @@ object GradientBoostedTrees extends Logging {
251251

252252
logInfo("Internal timing for DecisionTree:")
253253
logInfo(s"$timer")
254-
255-
new GradientBoostedTreesModel(
256-
boostingStrategy.treeStrategy.algo, baseLearners, baseLearnerWeights)
254+
if (validate) {
255+
new GradientBoostedTreesModel(
256+
boostingStrategy.treeStrategy.algo,
257+
baseLearners.slice(0, bestM),
258+
baseLearnerWeights.slice(0, bestM))
259+
} else {
260+
new GradientBoostedTreesModel(
261+
boostingStrategy.treeStrategy.algo, baseLearners, baseLearnerWeights)
262+
}
257263
}
258264

259265
}

0 commit comments

Comments
 (0)