Skip to content

Forecasting model framework for time series. #1900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 3, 2019
Merged

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented Dec 18, 2018

This PR introduces forecasting framework/interface for time series. It allows the following:

  1. Train forecasting model from an IDataView.
  2. Update the model with new observation using an IDataView.
  3. Forecast values up to a certain horizon (with confidence intervals).
  4. Checkpoint the model to disk.
  5. Load a model from disk.

fixes #929
fixes #3151

Copy link
Contributor

@shauheen shauheen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not addressing an issue.

@codemzs
Copy link
Member Author

codemzs commented Dec 18, 2018

@shauheen this PR addresses forecasting capability requested in issue #929 #Resolved

@shauheen
Copy link
Contributor

shauheen commented Dec 18, 2018

@codemzs That issue has not been triaged and does not have a clear design. #Resolved

@codemzs codemzs closed this Dec 18, 2018
@codemzs codemzs reopened this May 14, 2019
@codemzs codemzs dismissed shauheen’s stale review May 29, 2019 22:26

It is not triaged for 1.1 release.

@codemzs codemzs requested review from artidoro, wschin and ganik and removed request for Zruty0 May 29, 2019 22:30
@ganik
Copy link
Member

ganik commented May 30, 2019

    /// Train a forecasting model from an <see cref="IDataView"/>.

Is there an entrypoint for this? This would be needed for NimbusML #Resolved


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:1631 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@ganik
Copy link
Member

ganik commented May 30, 2019

    public float[] Forecast(int horizon) => _modeler.Forecast(horizon);

There should be API that does not only single point forecast but also gives area forecast based on confidence interval provided #Resolved


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:1649 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented May 30, 2019

    public enum RankSelectionMethod

Can you add comments for public fields? #Resolved


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:31 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented May 30, 2019

    {

Same here if you need these to be public, can you add documentation? #Resolved


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:39 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented May 30, 2019

    public AdaptiveSingularSpectrumSequenceForecastingModeler(IHostEnvironment env, int trainSize, int seriesLength, int windowSize, Single discountFactor = 1,

We have moved away from having public constructors. All our transforms/trainers are found through MLContext. We should follow the same pattern here I believe. #Resolved


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:83 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@codemzs
Copy link
Member Author

codemzs commented May 31, 2019

    /// Train a forecasting model from an <see cref="IDataView"/>.

We spoke offline and agreed there is a separate work item for entrypoints in time series,


In reply to: 497205979 [](ancestors = 497205979)


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:1631 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@codemzs
Copy link
Member Author

codemzs commented May 31, 2019

    public float[] Forecast(int horizon) => _modeler.Forecast(horizon);

Agreed but currently we don't have confidence intervals. I can check with Saeed and see if we can add later.


In reply to: 497206778 [](ancestors = 497206778)


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:1649 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@ganik
Copy link
Member

ganik commented May 31, 2019

    public AdaptiveSingularSpectrumSequenceForecastingModeler(IHostEnvironment env, int trainSize, int seriesLength, int windowSize, Single discountFactor = 1,

+1


In reply to: 497401572 [](ancestors = 497401572)


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:83 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@ganik
Copy link
Member

ganik commented May 31, 2019

    public float[] Forecast(int horizon) => _modeler.Forecast(horizon);

see this method ComputeForecastIntervals in the modeler class


In reply to: 497860420 [](ancestors = 497860420,497206778)


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:1649 in 5a29c42. [](commit_id = 5a29c42, deletion_comment = False)

@yaeldekel
Copy link

yaeldekel commented May 31, 2019

    internal sealed class AdaptiveSingularSpectrumSequenceModeler : SequenceModelerBase<Single, Single>

Can it be private? #Resolved


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:135 in 47e5f22. [](commit_id = 47e5f22, deletion_comment = False)

@codemzs
Copy link
Member Author

codemzs commented May 31, 2019

    internal sealed class AdaptiveSingularSpectrumSequenceModeler : SequenceModelerBase<Single, Single>

It can't because it is used in SSABase class as well.


In reply to: 497870543 [](ancestors = 497870543)


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:135 in 47e5f22. [](commit_id = 47e5f22, deletion_comment = False)

@codemzs
Copy link
Member Author

codemzs commented Jun 3, 2019

    /// </summary>

We discussed offline and decided to open an issue for this.


In reply to: 498346346 [](ancestors = 498346346)


Refers to: src/Microsoft.ML.TimeSeries/AdaptiveSingularSpectrumSequenceModeler.cs:36 in 2950dfa. [](commit_id = 2950dfa, deletion_comment = False)

@CESARDELATORRE
Copy link
Contributor

Question: Why is this 'Time Series Forecasting' algorithm/component not aligned to the rest of trainers in ML.NET?

Here, with 'Time Series Forecasting' you do the following:

  • You don't provide an specific trainer to the pipeline but run: ml.Forecasting.AdaptiveSingularSpectrumSequenceModeler()

  • You don't create a Prediction Engine when scoring but call 'model.Forecast()'

  • You don't call .Transform() when predicting in bulk, instead you call model.Forecast()

  • Also, saving and loading models into a .ZIP file is different (singular specific API implementation):
    ml.Model.SaveForecastingModel() and ml.Model.LoadForecastingModel

I think this 'singularity' in the API might cause some issues..
For instance, you cannot use the Microsoft.Extensions.ML PredictionEnginePool with this 'Time Series Forecasting' component or any other typical ML.NET code. This API is completely different.

Is it possible to shape its API on a similar way that other algorithms/trainers are in ML.NET?

@codemzs
Copy link
Member Author

codemzs commented Jun 3, 2019

@CESARDELATORRE The question you should ask yourself is if time series forecasting behaves the same way as other trainers and transforms? I'll be happy to explain the rationale behind design choices and we have a month to make any changes if needed before we make this GA.

@CESARDELATORRE
Copy link
Contributor

CESARDELATORRE commented Jun 3, 2019

Yes please, I'd like you to explain the rationale behind the API design here. If possible the internal implementation shouldn't surface to the external API interfaces. For instance, why do we have a different API interface for Loading/Saving to .ZIP file?: ml.Model.SaveForecastingModel() and ml.Model.LoadForecastingModel(). Let's meet to chat and discuss about it, ok? :)

@codemzs
Copy link
Member Author

codemzs commented Jun 3, 2019

@CESARDELATORRE sounds good.

@codemzs codemzs merged commit 380f167 into dotnet:master Jun 3, 2019
@rdzidziguri
Copy link

@CESARDELATORRE I found some API designs relatively questionable as well please see #3151 (comment), but as long as this is highly requested feature and we needed it a lot, for now, we are good to use this approach while keeping in mind that API surface might change in the future.

@eerhardt
Copy link
Member

eerhardt commented Jun 4, 2019

I agree that the new APIs introduced in this PR are drastically different (such as using different terms/words for what appear to be the same operations) from the existing ML.NET framework.

@codemzs - can you add a design document into the docs folder that describes why these forecasting APIs cannot fit into the existing framework of estimator, tranformer, data #581?

public struct GrowthRatio
{
private int _timeSpan;
private Double _growth;

/// <summary>
/// Time span of growth ratio. Must be strictly positive.
/// </summary>
public int TimeSpan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use an int here? Why not use a TimeSpan type, which avoids any confusion about what unit this int is? seconds? Minutes? days?


/// <summary>
/// Growth ratio. Defined as Growth^(1/TimeSpan).
/// </summary>
public struct GrowthRatio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using a mutable struct here the best API? Should we instead use a readonly struct? Or possibly make this a class.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Time Series forecasting Time series and forecasting
10 participants