-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Tree based trainers implement ICanGetSummaryAsIDataView #3892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -3378,7 +3361,7 @@ public TreeNode(Dictionary<string, object> keyValues) | |||
/// and <see cref="TreeEnsembleModelParametersBasedOnRegressionTree"/> is the type of | |||
/// <see cref="TrainedTreeEnsemble"/>. | |||
/// </summary> | |||
public abstract class TreeEnsembleModelParametersBasedOnRegressionTree : TreeEnsembleModelParameters | |||
public abstract class TreeEnsembleModelParametersBasedOnRegressionTree : TreeEnsembleModelParameters, ICanGetSummaryAsIDataView |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need ICanGetSummaryAs...
? IDataView RegressionTreeEnsembleAsIDataView(..)
looks sufficient for generating a summary. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the interface that Summarize entrypoint uses to get the IDataView from a trainer.
See:
public static CommonOutputs.SummaryOutput Summarize(IHostEnvironment env, SummarizePredictor.Input input) |
And:
internal static IDataView GetSummaryAndStats(IHostEnvironment env, IPredictor predictor, RoleMappedSchema schema, out IDataView stats) |
81a5d39
to
9b0e097
Compare
/// Used for the Summarize entrypoint. | ||
/// </summary> | ||
IDataView ICanGetSummaryAsIDataView.GetSummaryDataView(RoleMappedSchema schema) | ||
=> RegressionTreeBaseUtils.RegressionTreeEnsembleAsIDataView(Host, TrainedTreeEnsemble.Bias, TrainedTreeEnsemble.TreeWeights, TrainedTreeEnsemble.Trees); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
[LightGBMFact] | ||
public void LightGbmBinaryClassificationTestSummary() | ||
{ | ||
var (pipeline, dataView) = GetBinaryClassificationPipeline(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetBinaryClassificationPipeline [](start = 39, length = 31)
Make sure that these work with the categorical split after one hot encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reviewed this in person and mostly looks good. Signing off assuming you will be fix the output of the summary to not display negative values for node ids and instead use 2s complement so that it is easier to understand the tree structure.
d6bf2c4
to
78f4f86
Compare
Fixes #3755.
NimbusML did not have access to the details on the tree structure.
This PR implements the
ICanGetSummaryAsIDataView
interface which is used in theSummarize
entrypoint to pass a summary of the model parameters to NimbusML in the form of anIDataView
.I create a utility method that does the conversion from
RegressionTreeBase
toIDataView
with special treatment forQuantileRegressionTrees
which have additional information.In the
IDataView
, each node has its own row and the columns correspond to the fields describing each node. To determine which tree the node belongs to there is aTreeID
column.