Time Series

**Time Series in ML.NET**
* Forecasting
  * Singular Spectrum Analysis
    * Modeling univariate time series. Implementation based on the model taken from http://arxiv.org/pdf/1206.6910.pdf. 

* Anomaly Detection 
  * Spike Detector
    * Detects spikes in an independent and identical(IID) sequence using adaptive kernel density estimation
  * Change Point Detector
    * Detects the change-points in an independent and identical(IID) sequence using adaptive kernel density estimation and martingales

* Smoothing transforms
  * Exponential Average Transform
    * Is a weighted average of the values: ExpAvg(yt) = a * yt + (1-a) * ExpAvg(yt-1).
  * Moving Average Transform
    * Applies a moving average on a time series
  * Percentile Threshold Transform
    * Is a sequential transform that decides whether the current value of the time-series belongs to the 'percentile' % of the top values in the sliding window. The output of the transform will be a boolean flag.
  * P Value Transform
    * Calculates the p-value of the current input in the sequence with regard to the values in the sliding window.

**New Features to come**
* **Estimator and PiGSTy APIs** for the below components
  * |Component|Priority|
     |---|---|
     | IidChangePointDetector | 0 |
     | IidSpikeDetector  | 0 |
     | SsaChangePointDetector| 0 |
     | ExponentialAverageTransform| 1 |
     | MovingAverageTransform| 1 |
     | PercentileThresholdTransform| 1 |
     | PValueTransform| 1 |
     | SlidingWindowTransform| 1 |
     
     Example:
      ```csharp
      var data = new[] { new Data() { Feature = 2 }, new Data() { Feature = 1} }; 
      var dataView = ComponentCreation.CreateDataView(Env, data); 
      var pipe = new SpikeDetectorEstimator(Env, new[]{ 
      new SpikeDetectorTransformer.ColumnInfo("Feature", "Anomaly", twnd:500, swnd:50) 
       }); 
      var result = pipe.Fit(dataView).Transform(dataView); 
      var resultRoles = new RoleMappedData(result);
      ```
* **Prediction Engine**
  * The prediction engine we have today is a stateless one. For time series it is important we update the state of the model as we make prediction in the case of SSA models that have temporal relation between data points. This will require creating a new variant of prediction engine that will be used by time series components listed above. #163 
  * Currently to achieve stateful prediction engine users have to write the below code and then create checkpoints by saving the model every so often. 
    ```csharp
               const int size = 10;
                List<Data> data = new List<Data>(size);
                var dataView = env.CreateStreamingDataView(data);
                List<Data> tempData = new List<Data>();
                for (int i = 0; i < size / 2; i++)
                    tempData.Add(new Data(5));

                for (int i = 0; i < size / 2; i++)
                    tempData.Add(new Data((float)(5 + i * 1.1)));

                foreach (var d in tempData)
                    data.Add(new Data(d.Value));

                var args = new IidChangePointDetector.Arguments()
                {
                    Confidence = 80,
                    Source = "Value",
                    Name = "Change",
                    ChangeHistoryLength = size,
                    Data = dataView
                };
               
               //Train
                var detector = TimeSeriesProcessing.IidChangePointDetector(env, args);
                //Anomaly Detection
                var output = detector.Model.Apply(env, dataView);
                var enumerator = output.AsEnumerable<Prediction>(env, true).GetEnumerator();
               //Save the updated model 
               detector.Model.Save();
      ```
  * Instead we could have a below API that allows the user to save the model at fixed intervals to a disk or a stream while updating the model. 
```csharp 
           using (var env = new LocalEnvironment(seed: 1, conc: 1))
            {
                // Pipeline
                var loader = TextLoader.ReadFile(env, MakeArgs(), new MultiFileSource(GetDataPath(TestDatasets.AnomalyDetection)));
                var cachedTrain = new CacheDataView(env, loader, prefetch: null);

                // Train.
                var IidDetector = new IidChangePointDetector(env, new IidChangePointDetector.Arguments
                {
                   Confidence = 80,
                    Source = "Features",
                    Name = "Change",
                    ChangeHistoryLength = size,
                    Data = dataView
                });
                var trainRoles = new RoleMappedData(cachedTrain, feature: "Features");
                var predictor = IidDetector.Train(new Runtime.TrainContext(trainRoles));

                PredictionEngine<Input, Prediction> model;
                using (var file = env.CreateTempFile())
                {
                    // Save model. 
                    var roles = new RoleMappedData(trans, feature: "Features");
                    using (var ch = env.Start("saving"))
                        TrainUtils.SaveModel(env, ch, file, predictor, roles);

                    // Load model.
                    using (var fs = file.OpenReadStream())
                        model = env.CreatePredictionEngine<Input, Prediction>(fs);
                }

                // Take a couple examples out of the test data and run predictions on top.
                var testLoader = TextLoader.ReadFile(env, MakeArgs(), new MultiFileSource(GetDataPath(TestDatasets.AnomalyDetection));
                var testData = testLoader.AsEnumerable<Input>(env, false);
                foreach (var input in testData.Take(10))
                {
                   //Anomaly detection + update the model.
                    var prediction = model.Predict(input);
                }                
               
               //Save the model with updated state.
                model.Save();
            }
```

* **Online Training**
   Currently we have to train the model with the entire train dataset to update the model but instead it would be nice if the model got updated as the data came in. #163

* **Evaluator**
  * Currently we don’t have any evaluator for time series. Rolling CV is better for time dependent datasets by always testing on data which is newer than the training data. Standard CV leaks future data in to the training set. Other names of Rolling CV include { walk-forward / roll-forward / rolling origin / window } CV. Refer to #1026

* **ARIMA model** 
It seems the first thing novice time series users look for in a toolkit when doing a forecasting task is ARMIA model because it is the first thing that comes up in search results for forecasting. While ARMIA model isn’t the most accurate or performant model but it is the most well-known forecasting model. We should consider bringing in a simple implementation of ARIMA in ML.NET. #929 

* **Time Series Featurizer**  
The more performant models are the one that combine the features from a time series transform with non-time series features and feed in the resulting vector into a black-box regression learning algorithm. For example, one could have two features A and B, where A will contain data points that have temporal relationship between them, example, stock price and B contains non-temporal feature like country or zip code. We could feed A into SSA transform that will extract various components from an individual feature value such as trend, level, seasonality and then repeat this for all the feature values of A and then combine the result vector with feature B that could be feed into a regressor for prediction. The feature extraction step could be SSA or it could be a deep learning model such as LSTM. The regressor could be any regression based learner.  #929 


... and many more with `time`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Time Series #978

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Priority
IidChangePointDetector	0
IidSpikeDetector	0
SsaChangePointDetector	0
ExponentialAverageTransform	1
MovingAverageTransform	1
PercentileThresholdTransform	1
PValueTransform	1
SlidingWindowTransform	1

Time Series #978

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions