Description
Time Series in ML.NET
-
Forecasting
- Singular Spectrum Analysis
- Modeling univariate time series. Implementation based on the model taken from http://arxiv.org/pdf/1206.6910.pdf.
- Singular Spectrum Analysis
-
Anomaly Detection
- Spike Detector
- Detects spikes in an independent and identical(IID) sequence using adaptive kernel density estimation
- Change Point Detector
- Detects the change-points in an independent and identical(IID) sequence using adaptive kernel density estimation and martingales
- Spike Detector
-
Smoothing transforms
- Exponential Average Transform
- Is a weighted average of the values: ExpAvg(yt) = a * yt + (1-a) * ExpAvg(yt-1).
- Moving Average Transform
- Applies a moving average on a time series
- Percentile Threshold Transform
- Is a sequential transform that decides whether the current value of the time-series belongs to the 'percentile' % of the top values in the sliding window. The output of the transform will be a boolean flag.
- P Value Transform
- Calculates the p-value of the current input in the sequence with regard to the values in the sliding window.
- Exponential Average Transform
New Features to come
- Estimator and PiGSTy APIs for the below components
-
Component Priority IidChangePointDetector 0 IidSpikeDetector 0 SsaChangePointDetector 0 ExponentialAverageTransform 1 MovingAverageTransform 1 PercentileThresholdTransform 1 PValueTransform 1 SlidingWindowTransform 1 Example:
var data = new[] { new Data() { Feature = 2 }, new Data() { Feature = 1} }; var dataView = ComponentCreation.CreateDataView(Env, data); var pipe = new SpikeDetectorEstimator(Env, new[]{ new SpikeDetectorTransformer.ColumnInfo("Feature", "Anomaly", twnd:500, swnd:50) }); var result = pipe.Fit(dataView).Transform(dataView); var resultRoles = new RoleMappedData(result);
-
- Prediction Engine
- The prediction engine we have today is a stateless one. For time series it is important we update the state of the model as we make prediction in the case of SSA models that have temporal relation between data points. This will require creating a new variant of prediction engine that will be used by time series components listed above. Support time series anomaly algorithms #163
- Currently to achieve stateful prediction engine users have to write the below code and then create checkpoints by saving the model every so often.
const int size = 10; List<Data> data = new List<Data>(size); var dataView = env.CreateStreamingDataView(data); List<Data> tempData = new List<Data>(); for (int i = 0; i < size / 2; i++) tempData.Add(new Data(5)); for (int i = 0; i < size / 2; i++) tempData.Add(new Data((float)(5 + i * 1.1))); foreach (var d in tempData) data.Add(new Data(d.Value)); var args = new IidChangePointDetector.Arguments() { Confidence = 80, Source = "Value", Name = "Change", ChangeHistoryLength = size, Data = dataView }; //Train var detector = TimeSeriesProcessing.IidChangePointDetector(env, args); //Anomaly Detection var output = detector.Model.Apply(env, dataView); var enumerator = output.AsEnumerable<Prediction>(env, true).GetEnumerator(); //Save the updated model detector.Model.Save();
- Instead we could have a below API that allows the user to save the model at fixed intervals to a disk or a stream while updating the model.
using (var env = new LocalEnvironment(seed: 1, conc: 1))
{
// Pipeline
var loader = TextLoader.ReadFile(env, MakeArgs(), new MultiFileSource(GetDataPath(TestDatasets.AnomalyDetection)));
var cachedTrain = new CacheDataView(env, loader, prefetch: null);
// Train.
var IidDetector = new IidChangePointDetector(env, new IidChangePointDetector.Arguments
{
Confidence = 80,
Source = "Features",
Name = "Change",
ChangeHistoryLength = size,
Data = dataView
});
var trainRoles = new RoleMappedData(cachedTrain, feature: "Features");
var predictor = IidDetector.Train(new Runtime.TrainContext(trainRoles));
PredictionEngine<Input, Prediction> model;
using (var file = env.CreateTempFile())
{
// Save model.
var roles = new RoleMappedData(trans, feature: "Features");
using (var ch = env.Start("saving"))
TrainUtils.SaveModel(env, ch, file, predictor, roles);
// Load model.
using (var fs = file.OpenReadStream())
model = env.CreatePredictionEngine<Input, Prediction>(fs);
}
// Take a couple examples out of the test data and run predictions on top.
var testLoader = TextLoader.ReadFile(env, MakeArgs(), new MultiFileSource(GetDataPath(TestDatasets.AnomalyDetection));
var testData = testLoader.AsEnumerable<Input>(env, false);
foreach (var input in testData.Take(10))
{
//Anomaly detection + update the model.
var prediction = model.Predict(input);
}
//Save the model with updated state.
model.Save();
}
-
Online Training
Currently we have to train the model with the entire train dataset to update the model but instead it would be nice if the model got updated as the data came in. Support time series anomaly algorithms #163 -
Evaluator
- Currently we don’t have any evaluator for time series. Rolling CV is better for time dependent datasets by always testing on data which is newer than the training data. Standard CV leaks future data in to the training set. Other names of Rolling CV include { walk-forward / roll-forward / rolling origin / window } CV. Refer to Rolling Cross-validation for Time-series #1026
-
ARIMA model
It seems the first thing novice time series users look for in a toolkit when doing a forecasting task is ARMIA model because it is the first thing that comes up in search results for forecasting. While ARMIA model isn’t the most accurate or performant model but it is the most well-known forecasting model. We should consider bringing in a simple implementation of ARIMA in ML.NET. Time series and forecasting #929 -
Time Series Featurizer
The more performant models are the one that combine the features from a time series transform with non-time series features and feed in the resulting vector into a black-box regression learning algorithm. For example, one could have two features A and B, where A will contain data points that have temporal relationship between them, example, stock price and B contains non-temporal feature like country or zip code. We could feed A into SSA transform that will extract various components from an individual feature value such as trend, level, seasonality and then repeat this for all the feature values of A and then combine the result vector with feature B that could be feed into a regressor for prediction. The feature extraction step could be SSA or it could be a deep learning model such as LSTM. The regressor could be any regression based learner. Time series and forecasting #929
... and many more with time
.