Skip to content

Tree based trainers implement ICanGetSummaryAsIDataView #3892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 2, 2019

Conversation

artidoro
Copy link
Contributor

Fixes #3755.

NimbusML did not have access to the details on the tree structure.
This PR implements the ICanGetSummaryAsIDataView interface which is used in the Summarize entrypoint to pass a summary of the model parameters to NimbusML in the form of an IDataView.

I create a utility method that does the conversion from RegressionTreeBase to IDataView with special treatment for QuantileRegressionTrees which have additional information.

In the IDataView, each node has its own row and the columns correspond to the fields describing each node. To determine which tree the node belongs to there is a TreeID column.

@artidoro artidoro requested review from wschin, yaeldekel and ganik June 20, 2019 17:24
@artidoro artidoro self-assigned this Jun 20, 2019
@@ -3378,7 +3361,7 @@ public TreeNode(Dictionary<string, object> keyValues)
/// and <see cref="TreeEnsembleModelParametersBasedOnRegressionTree"/> is the type of
/// <see cref="TrainedTreeEnsemble"/>.
/// </summary>
public abstract class TreeEnsembleModelParametersBasedOnRegressionTree : TreeEnsembleModelParameters
public abstract class TreeEnsembleModelParametersBasedOnRegressionTree : TreeEnsembleModelParameters, ICanGetSummaryAsIDataView
Copy link
Member

@wschin wschin Jun 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need ICanGetSummaryAs...? IDataView RegressionTreeEnsembleAsIDataView(..) looks sufficient for generating a summary. #ByDesign

Copy link
Contributor Author

@artidoro artidoro Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the interface that Summarize entrypoint uses to get the IDataView from a trainer.

See:

public static CommonOutputs.SummaryOutput Summarize(IHostEnvironment env, SummarizePredictor.Input input)

And:

internal static IDataView GetSummaryAndStats(IHostEnvironment env, IPredictor predictor, RoleMappedSchema schema, out IDataView stats)
#Resolved

@ganik
Copy link
Member

ganik commented Jun 21, 2019

    public void EntryPointPipelineEnsembleText()

Pls add test using graph json text #Resolved


Refers to: test/Microsoft.ML.Core.Tests/UnitTests/TestEntryPoints.cs:934 in 81a5d39. [](commit_id = 81a5d39, deletion_comment = False)

/// Used for the Summarize entrypoint.
/// </summary>
IDataView ICanGetSummaryAsIDataView.GetSummaryDataView(RoleMappedSchema schema)
=> RegressionTreeBaseUtils.RegressionTreeEnsembleAsIDataView(Host, TrainedTreeEnsemble.Bias, TrainedTreeEnsemble.TreeWeights, TrainedTreeEnsemble.Trees);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

[LightGBMFact]
public void LightGbmBinaryClassificationTestSummary()
{
var (pipeline, dataView) = GetBinaryClassificationPipeline();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetBinaryClassificationPipeline [](start = 39, length = 31)

Make sure that these work with the categorical split after one hot encoding.

@codemzs codemzs self-requested a review June 29, 2019 00:35
Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reviewed this in person and mostly looks good. Signing off assuming you will be fix the output of the summary to not display negative values for node ids and instead use 2s complement so that it is easier to understand the tree structure.

@artidoro artidoro force-pushed the fasttreesummary2 branch from d6bf2c4 to 78f4f86 Compare July 1, 2019 20:31
@artidoro artidoro merged commit d0b3f86 into dotnet:master Jul 2, 2019
Dmitry-A pushed a commit to Dmitry-A/machinelearning that referenced this pull request Jul 24, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Model summary to show tree details for FastTree
4 participants