-
Notifications
You must be signed in to change notification settings - Fork 1.9k
just a mere concept for seed, threads and logging #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ | |
using System.Collections.Generic; | ||
using System.Diagnostics; | ||
using System.IO; | ||
using static Microsoft.ML.Runtime.DefaultEnvironment; | ||
|
||
namespace Microsoft.ML | ||
{ | ||
|
@@ -48,15 +49,26 @@ public ScorerPipelineStep(Var<IDataView> data, Var<ITransformModel> model) | |
[DebuggerTypeProxy(typeof(LearningPipelineDebugProxy))] | ||
public class LearningPipeline : ICollection<ILearningPipelineItem> | ||
{ | ||
readonly internal IHostEnvironment Env; | ||
private List<ILearningPipelineItem> Items { get; } = new List<ILearningPipelineItem>(); | ||
|
||
/// <summary> | ||
/// Construct an empty <see cref="LearningPipeline"/> object. | ||
/// </summary> | ||
public LearningPipeline() | ||
public LearningPipeline(int? seed = null, int concurrency = 0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hey Ivan, thanks much for addressing this! Seed and concurrency are fine... but in other contexts we've also talked about the need to change the output and error text readers, plus of course the progress indicators. :) The reason I mention this is: whatever solution you come up with that actually captures all these important things will, I am quite certain, wind up being almost completely isomorphic to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't it be confusing for user to have method Register, StartProgressChannel and so on for LearningPipeline class? In reply to: 187756624 [](ancestors = 187756624) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I second Tom's sentiment. Why not just take an optional IHostEnvironment argument? There are several benefits in addition to support of the needed functionality:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you have that ability here :) just subscribe to event, and handle messages as you like. I'm actually not sure how its possible with current API and how IHostEnvironment argument will help you in that. In reply to: 187758015 [](ancestors = 187758015) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and btw Tom suggest to make LearningPipeline:IHostEnvironment and you suggest LearningPipeline(IHostEnvironment env) :) In reply to: 187758499 [](ancestors = 187758499,187758015) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, if MachineLearning.Net used what's baked in the framework more or less (like MEL), Ivan could configure https://github.com/serilog/serilog-sinks-console to log into console. Or to file, or both, right now. Or use Log4net and do whatever he does there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @veikkoeeva ... sure, and if we want to come up with an implementation of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomFinley I have a hunch that the implementation of For the sake of discussion, one way of seeing a DI framework is that it's a container to which one composes the object graph – as in a tree rooted in the container. Then it allows one to switch parts with given parameters: single instance ever resolved, per-thread instances, per-call instances, instances parametrized in some other way. May seem complicated as written like this, but isn't. One example here: http://autofaccn.readthedocs.io/en/latest/getting-started/#application-startup. For ASP.NET Core, for instance, it's explained at https://docs.microsoft.com/en-us/aspnet/core/fundamentals/dependency-injection?view=aspnetcore-2.0. The benefit is that the object graph is clearly defined. Further, if one makes it so that injection happens via constructors only and the data injected is as much as The current system has some of these benefits, but in unfamiliar form, and mightn't be as flexible as it could with DI. Points:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomFinley To answer directly to
Maybe you have an opinion already since you ask an opinion to a small modification to Ivan's work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @veikkoeeva , sure. Put another way? Does that solution make people happy or unhappy that want this thing to plug into their favorite logging framework, whatever that may be? If unhappy, maybe an alternate specific solution should be proposed. I think dependency injection is another kettle of fish. You mentioned MEF. That might be a good thing to use, I just don't know. It certainly has many of the features the component catalog needs. It would be nice if there could be a replacement. Whether it's suitable though would be an investigation. |
||
{ | ||
|
||
var env = new DefaultEnvironment(seed: seed, conc: concurrency); | ||
env.MessageRecieved += Env_MessageRecieved; | ||
Env = env; | ||
} | ||
|
||
private void Env_MessageRecieved(object sender, ChannelMessageEventArgs e) | ||
{ | ||
MessageOccured?.Invoke(this, e); | ||
} | ||
|
||
public event EventHandler<ChannelMessageEventArgs> MessageOccured; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good example of how this class is morphing to become host environment like. We create a facility identical to that of the environment you create, but we choose to rename it from |
||
/// <summary> | ||
/// Get the count of ML components in the <see cref="LearningPipeline"/> object | ||
/// </summary> | ||
|
@@ -137,80 +149,76 @@ public PredictionModel<TInput, TOutput> Train<TInput, TOutput>() | |
where TInput : class | ||
where TOutput : class, new() | ||
{ | ||
Experiment experiment = Env.CreateExperiment(); | ||
ILearningPipelineStep step = null; | ||
List<ILearningPipelineLoader> loaders = new List<ILearningPipelineLoader>(); | ||
List<Var<ITransformModel>> transformModels = new List<Var<ITransformModel>>(); | ||
Var<ITransformModel> lastTransformModel = null; | ||
|
||
using (var environment = new TlcEnvironment()) | ||
foreach (ILearningPipelineItem currentItem in this) | ||
{ | ||
Experiment experiment = environment.CreateExperiment(); | ||
ILearningPipelineStep step = null; | ||
List<ILearningPipelineLoader> loaders = new List<ILearningPipelineLoader>(); | ||
List<Var<ITransformModel>> transformModels = new List<Var<ITransformModel>>(); | ||
Var<ITransformModel> lastTransformModel = null; | ||
if (currentItem is ILearningPipelineLoader loader) | ||
loaders.Add(loader); | ||
|
||
step = currentItem.ApplyStep(step, experiment); | ||
if (step is ILearningPipelineDataStep dataStep && dataStep.Model != null) | ||
transformModels.Add(dataStep.Model); | ||
|
||
foreach (ILearningPipelineItem currentItem in this) | ||
else if (step is ILearningPipelinePredictorStep predictorDataStep) | ||
{ | ||
if (currentItem is ILearningPipelineLoader loader) | ||
loaders.Add(loader); | ||
|
||
step = currentItem.ApplyStep(step, experiment); | ||
if (step is ILearningPipelineDataStep dataStep && dataStep.Model != null) | ||
transformModels.Add(dataStep.Model); | ||
|
||
else if (step is ILearningPipelinePredictorStep predictorDataStep) | ||
if (lastTransformModel != null) | ||
transformModels.Insert(0, lastTransformModel); | ||
|
||
var localModelInput = new Transforms.ManyHeterogeneousModelCombiner | ||
{ | ||
if (lastTransformModel != null) | ||
transformModels.Insert(0, lastTransformModel); | ||
|
||
var localModelInput = new Transforms.ManyHeterogeneousModelCombiner | ||
{ | ||
PredictorModel = predictorDataStep.Model, | ||
TransformModels = new ArrayVar<ITransformModel>(transformModels.ToArray()) | ||
}; | ||
|
||
var localModelOutput = experiment.Add(localModelInput); | ||
|
||
var scorer = new Transforms.Scorer | ||
{ | ||
PredictorModel = localModelOutput.PredictorModel | ||
}; | ||
|
||
var scorerOutput = experiment.Add(scorer); | ||
lastTransformModel = scorerOutput.ScoringTransform; | ||
step = new ScorerPipelineStep(scorerOutput.ScoredData, scorerOutput.ScoringTransform); | ||
transformModels.Clear(); | ||
} | ||
} | ||
PredictorModel = predictorDataStep.Model, | ||
TransformModels = new ArrayVar<ITransformModel>(transformModels.ToArray()) | ||
}; | ||
|
||
if (transformModels.Count > 0) | ||
{ | ||
transformModels.Insert(0,lastTransformModel); | ||
var modelInput = new Transforms.ModelCombiner | ||
var localModelOutput = experiment.Add(localModelInput); | ||
|
||
var scorer = new Transforms.Scorer | ||
{ | ||
Models = new ArrayVar<ITransformModel>(transformModels.ToArray()) | ||
PredictorModel = localModelOutput.PredictorModel | ||
}; | ||
|
||
var modelOutput = experiment.Add(modelInput); | ||
lastTransformModel = modelOutput.OutputModel; | ||
var scorerOutput = experiment.Add(scorer); | ||
lastTransformModel = scorerOutput.ScoringTransform; | ||
step = new ScorerPipelineStep(scorerOutput.ScoredData, scorerOutput.ScoringTransform); | ||
transformModels.Clear(); | ||
} | ||
} | ||
|
||
experiment.Compile(); | ||
foreach (ILearningPipelineLoader loader in loaders) | ||
if (transformModels.Count > 0) | ||
{ | ||
transformModels.Insert(0, lastTransformModel); | ||
var modelInput = new Transforms.ModelCombiner | ||
{ | ||
loader.SetInput(environment, experiment); | ||
} | ||
experiment.Run(); | ||
Models = new ArrayVar<ITransformModel>(transformModels.ToArray()) | ||
}; | ||
|
||
ITransformModel model = experiment.GetOutput(lastTransformModel); | ||
BatchPredictionEngine<TInput, TOutput> predictor; | ||
using (var memoryStream = new MemoryStream()) | ||
{ | ||
model.Save(environment, memoryStream); | ||
var modelOutput = experiment.Add(modelInput); | ||
lastTransformModel = modelOutput.OutputModel; | ||
} | ||
|
||
memoryStream.Position = 0; | ||
experiment.Compile(); | ||
foreach (ILearningPipelineLoader loader in loaders) | ||
{ | ||
loader.SetInput(Env, experiment); | ||
} | ||
experiment.Run(); | ||
|
||
predictor = environment.CreateBatchPredictionEngine<TInput, TOutput>(memoryStream); | ||
ITransformModel model = experiment.GetOutput(lastTransformModel); | ||
BatchPredictionEngine<TInput, TOutput> predictor; | ||
using (var memoryStream = new MemoryStream()) | ||
{ | ||
model.Save(Env, memoryStream); | ||
|
||
return new PredictionModel<TInput, TOutput>(predictor, memoryStream); | ||
} | ||
memoryStream.Position = 0; | ||
|
||
predictor = Env.CreateBatchPredictionEngine<TInput, TOutput>(memoryStream); | ||
|
||
return new PredictionModel<TInput, TOutput>(predictor, memoryStream); | ||
} | ||
} | ||
|
||
|
@@ -220,9 +228,9 @@ public PredictionModel<TInput, TOutput> Train<TInput, TOutput>() | |
/// <returns> | ||
/// The IDataView that was returned by the pipeline. | ||
/// </returns> | ||
internal IDataView Execute(IHostEnvironment environment) | ||
internal IDataView Execute() | ||
{ | ||
Experiment experiment = environment.CreateExperiment(); | ||
Experiment experiment = Env.CreateExperiment(); | ||
ILearningPipelineStep step = null; | ||
List<ILearningPipelineLoader> loaders = new List<ILearningPipelineLoader>(); | ||
foreach (ILearningPipelineItem currentItem in this) | ||
|
@@ -241,7 +249,7 @@ internal IDataView Execute(IHostEnvironment environment) | |
experiment.Compile(); | ||
foreach (ILearningPipelineLoader loader in loaders) | ||
{ | ||
loader.SetInput(environment, experiment); | ||
loader.SetInput(Env, experiment); | ||
} | ||
experiment.Run(); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
using Microsoft.ML.Runtime.Data; | ||
using System; | ||
|
||
namespace Microsoft.ML.Runtime | ||
{ | ||
public sealed class DefaultEnvironment : HostEnvironmentBase<DefaultEnvironment> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason why this type is public? |
||
{ | ||
public DefaultEnvironment(int? seed = null, int conc = 0) | ||
: this(RandomUtils.Create(seed), true, conc) | ||
{ | ||
} | ||
|
||
public DefaultEnvironment(IRandom rand, bool verbose, int conc, string shortName = null, string parentFullName = null) : base(rand, verbose, conc, shortName, parentFullName) | ||
{ | ||
EnsureDispatcher<ChannelMessage>(); | ||
AddListener<ChannelMessage>(OnMessageRecieved); | ||
} | ||
|
||
void OnMessageRecieved(IMessageSource sender, ChannelMessage msg) | ||
{ | ||
ChannelMessageEventArgs eventArgs = new ChannelMessageEventArgs() { Message = msg }; | ||
MessageRecieved?.Invoke(this, eventArgs); | ||
} | ||
|
||
public event EventHandler<ChannelMessageEventArgs> MessageRecieved; | ||
public class ChannelMessageEventArgs : EventArgs | ||
{ | ||
public ChannelMessage Message { get; set; } | ||
} | ||
|
||
private sealed class Channel : ChannelBase | ||
{ | ||
public Channel(DefaultEnvironment master, ChannelProviderBase parent, string shortName, Action<IMessageSource, ChannelMessage> dispatch) | ||
: base(master, parent, shortName, dispatch) | ||
{ | ||
} | ||
} | ||
|
||
private sealed class Host : HostBase | ||
{ | ||
public new bool IsCancelled => Root.IsCancelled; | ||
|
||
public Host(HostEnvironmentBase<DefaultEnvironment> source, string shortName, string parentFullName, IRandom rand, bool verbose, int? conc) | ||
: base(source, shortName, parentFullName, rand, verbose, conc) | ||
{ | ||
} | ||
|
||
protected override IChannel CreateCommChannel(ChannelProviderBase parent, string name) | ||
{ | ||
Contracts.AssertValue(parent); | ||
Contracts.Assert(parent is Host); | ||
Contracts.AssertNonEmpty(name); | ||
return new Channel(Root, parent, name, GetDispatchDelegate<ChannelMessage>()); | ||
} | ||
|
||
protected override IPipe<TMessage> CreatePipe<TMessage>(ChannelProviderBase parent, string name) | ||
{ | ||
Contracts.AssertValue(parent); | ||
Contracts.Assert(parent is Host); | ||
Contracts.AssertNonEmpty(name); | ||
return new Pipe<TMessage>(parent, name, GetDispatchDelegate<TMessage>()); | ||
} | ||
|
||
protected override IHost RegisterCore(HostEnvironmentBase<DefaultEnvironment> source, string shortName, string parentFullName, IRandom rand, bool verbose, int? conc) | ||
{ | ||
return new Host(source, shortName, parentFullName, rand, verbose, conc); | ||
} | ||
} | ||
|
||
protected override IHost RegisterCore(HostEnvironmentBase<DefaultEnvironment> source, string shortName, string parentFullName, IRandom rand, bool verbose, int? conc) | ||
{ | ||
Contracts.AssertValue(rand); | ||
Contracts.AssertValueOrNull(parentFullName); | ||
Contracts.AssertNonEmpty(shortName); | ||
Contracts.Assert(source == this || source is Host); | ||
return new Host(source, shortName, parentFullName, rand, verbose, conc); | ||
} | ||
|
||
protected override IChannel CreateCommChannel(ChannelProviderBase parent, string name) | ||
{ | ||
Contracts.AssertValue(parent); | ||
Contracts.Assert(parent is DefaultEnvironment); | ||
Contracts.AssertNonEmpty(name); | ||
return new Channel(this, parent, name, GetDispatchDelegate<ChannelMessage>()); | ||
} | ||
|
||
protected override IPipe<TMessage> CreatePipe<TMessage>(ChannelProviderBase parent, string name) | ||
{ | ||
Contracts.AssertValue(parent); | ||
Contracts.Assert(parent is DefaultEnvironment); | ||
Contracts.AssertNonEmpty(name); | ||
return new Pipe<TMessage>(parent, name, GetDispatchDelegate<TMessage>()); | ||
} | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HostEnvironmentBase just happens to be IDisposable.
Since this design makes the pipeline own an instance of Env, your have to make the class disposable,
Alternatively, you can take an optional environment and then the problem of owning env becomse the callers problem.
If a user has to pass an environment to this class there are several potental benefits: