Add SrCnn entire API by implementing function #5135

mengaims · 2020-05-15T13:26:35Z

We are excited to review your PR.

So we can do the best job, please check:

There's a descriptive title that will make sense to other developers some time from now.
There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
You have included any necessary tests in the same PR.

* POC Batch transform * SrCnn batch interface * Removed comment * Handled some APIreview comments. * Handled other review comments. * Resolved review comments. Added sample. Co-authored-by: Yael Dekel <[email protected]>

…ire_function

mengaims · 2020-05-15T13:31:52Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+            if (batchSize == -1)
+            {
+                _batchSize = (int)input.GetRowCount();
+            }


In the transformer version, we allow batchSize to be -1, means using the whole data set as one batch. Here I try to get row count from input when batchSize=-1 and set an actual positive batchSize. But I found that if we load dataview by text file loader, we could not get this count by this way. Is there any other way to achieve this? #Resolved

ML.NET does lazy evaluation of most things. So the textloader may not know the number of rows ahead of time. When batchSize is -1, you will be computing the anomaly on the whole file and so you are going to read the whole file before doing your computation. You will need to pass the -1 value of _batchSize further and calculate the actual row count after you have read all the data and before computation begins.

In reply to: 425803839 [](ancestors = 425803839)

Correct. I commented the same thing above before I noticed your comment.
If you need the number of rows in the data, you can compute it here in the ctor, by getting a row cursor with no active columns and counting how many times MoveNext() returns true. So you can do something like this:

var rowCount = input.GetRowCount(); if (rowCount!=null) _batchSize = rowCount; else { _batchSize = 0; using (var cursor = input.GetRowCursor()) { while (cursor.MoveNext()) _batchSize++; } }

In reply to: 426029554 [](ancestors = 426029554,425803839)

By the way, you would need to check for overflow in this case, since the row count can be greater than int.MaxValue.

In reply to: 426407146 [](ancestors = 426407146,426029554,425803839)

I have some concern about count batchSize here since this may introduce an extra traversing. Could we allow batchSize=-1 when initialize Batch object, and in GetIsNewBatchDelegate and GetLastInBatchDelegate, add a special condition for _batchSize=1? In Batch class, since data will be read in later, there is no need to get batchSize from outside when no batch is actually used.

In reply to: 426407783 [](ancestors = 426407783,426407146,426029554,425803839)

Fixed, please check.

In reply to: 426719317 [](ancestors = 426719317,426407783,426407146,426029554,425803839)

harishsk · 2020-05-15T19:25:55Z

src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs

+        /// </format>
+        /// </example>
+        public static IDataView BatchDetectAnomalyBySrCnn(this DataOperationsCatalog catalog, IDataView input, string outputColumnName, string inputColumnName,
+            double threshold = 0.3, int batchSize = 1024, double sensitivity = 99, SrCnnDetectMode detectMode = SrCnnDetectMode.AnomalyOnly)


Shouldn't this be in the TimeSeries catalog? #Resolved

It was the suggestion from @yaeldekel to put it here #Resolved

@yaeldekel, Can you please elaborate? We already have a TimeSeries catalog. Shouldn't it belong there?

In reply to: 426065077 [](ancestors = 426065077)

In previous iterations it was in the TransformsCatalog which is why I suggested moving it. TimeSeriesCatalog would also work.

In reply to: 426078156 [](ancestors = 426078156,426065077)

I see that RCA used a newly defined AnomalyDetectionCatalog, maybe it would be better to align with it?

In reply to: 426439074 [](ancestors = 426439074,426078156,426065077)

Yes, Please move it to the AnomalyDetectionCatalog

In reply to: 427140758 [](ancestors = 427140758,426439074,426078156,426065077)

harishsk · 2020-05-15T20:02:55Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+        private readonly string _inputColumnName;
+        private readonly int _outputLength;
+        private readonly SrCnnEntireModeler _modler;
+


typo: _modeler #Resolved

fixed.

In reply to: 426020483 [](ancestors = 426020483)

harishsk · 2020-05-15T20:28:15Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+            {
+                _outputLength = AnomalyOnlyOutputLength;
+                _modler = new SrCnnEntireModeler(threshold, sensitivity, detectMode, _outputLength);
+            }


suggestion: This line is common to all the if blocks and can be moved out of the if blocks. #Resolved

Fixed.

In reply to: 426031408 [](ancestors = 426031408)

harishsk · 2020-05-15T20:29:32Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+        private const int AnomalyOnlyOutputLength = 3;
+        private const int AnomalyAndExpectedValueOutputLength = 4;
+        private const int AnomalyAndMarginOutputLength = 7;
+


suggestion: You can put this in an array to match the index of SrCnnDetectMode and reference the array when initializing the value of _outputLength in the constructor #Resolved

fixed.

In reply to: 426031904 [](ancestors = 426031904)

yaeldekel · 2020-05-18T06:36:03Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+        }
+
+        public SrCnnBatchAnomalyDetector(IHostEnvironment env, IDataView input, string inputColumnName, string outputColumnName, double threshold, int batchSize, double sensitivity, SrCnnDetectMode detectMode)
+            : base(env, "SrCnnBatchAnomalyDetector", input)


SrCnnBatchAnomalyDetector [](start = 25, length = 25)

Please use nameof(SrCnnBatchAnomalyDetector). #Resolved

fixed,

In reply to: 426396967 [](ancestors = 426396967)

yaeldekel · 2020-05-18T06:37:10Z

src/Microsoft.ML.Data/DataView/BatchDataViewMapperBase.cs

+        public DataViewSchema Schema => SchemaBindings.AsSchema;
+
+        private readonly IDataView _source;
+        private readonly IHost _host;


_host [](start = 31, length = 5)

Make this protected, and use in the deriving class. #Resolved

fixed.

In reply to: 426397354 [](ancestors = 426397354)

yaeldekel · 2020-05-18T06:37:41Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+        public SrCnnBatchAnomalyDetector(IHostEnvironment env, IDataView input, string inputColumnName, string outputColumnName, double threshold, int batchSize, double sensitivity, SrCnnDetectMode detectMode)
+            : base(env, "SrCnnBatchAnomalyDetector", input)
+        {
+            Contracts.CheckValue(env, nameof(env));


CheckValue [](start = 22, length = 10)

This check should be done in the base class. #Resolved

fixed.

In reply to: 426397534 [](ancestors = 426397534)

yaeldekel · 2020-05-18T06:38:10Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+        {
+            Contracts.CheckValue(env, nameof(env));
+
+            Contracts.CheckValue(inputColumnName, nameof(inputColumnName));


Contracts [](start = 12, length = 9)

The IHost in the base class should be protected, and it should be used here. #Resolved

fixed.

In reply to: 426397727 [](ancestors = 426397727)

yaeldekel · 2020-05-18T06:38:40Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+            Contracts.CheckValue(inputColumnName, nameof(inputColumnName));
+            _inputColumnName = inputColumnName;
+
+            env.CheckUserArg(batchSize == -1 || batchSize >= MinBatchSize, nameof(batchSize), "BatchSize must be -1 or no less than 12.");


env [](start = 12, length = 3)

Use IHost from base class. #Resolved

fixed.

In reply to: 426397902 [](ancestors = 426397902)

yaeldekel · 2020-05-18T06:41:59Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+            env.CheckUserArg(batchSize == -1 || batchSize >= MinBatchSize, nameof(batchSize), "BatchSize must be -1 or no less than 12.");
+            if (batchSize == -1)
+            {
+                _batchSize = (int)input.GetRowCount();


GetRowCount [](start = 40, length = 11)

That will most likely not work, since most implementations of IDataView return null for this method. #Resolved

yaeldekel · 2020-05-18T07:04:59Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+                || detectMode == SrCnnDetectMode.AnomalyAndExpectedValue
+                || detectMode == SrCnnDetectMode.AnomalyAndMargin, nameof(detectMode), "Invalid detectMode");
+
+            if (detectMode.Equals(SrCnnDetectMode.AnomalyOnly))


AnomalyOnly [](start = 50, length = 11)

Since this is an enum, this can be done using a switch statement. #Resolved

As commented above, you don't need an if or a switch statement if you store the possible values for _outputLength in an const array whose indices match the the enum values.

In reply to: 426408554 [](ancestors = 426408554)

Fixed.

In reply to: 426828074 [](ancestors = 426828074,426408554)

yaeldekel · 2020-05-18T07:07:59Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+
+        protected override Batch InitializeBatch(DataViewRowCursor input) => new Batch(_batchSize, _outputLength, _modler);
+
+        protected override Func<bool> GetIsNewBatchDelegate(DataViewRowCursor input)


GetIsNewBatchDelegate [](start = 38, length = 21)

I think this method and GetLastInBatchDelegate can be implemented in the base class, I can't imagine a future implementation of the base class that would use a different way of computing this. @harishsk, what do you think? #Resolved

Since a check for _batchSize==-1 is added here, I think it's proper to keep it in inherit class

In reply to: 426409875 [](ancestors = 426409875)

I think adding -1 as an option meaning "the entire data view" is reasonable for the base class, but I won't insist on it if you think that it is something specific to this implementation of the base class.
/cc @harishsk

In reply to: 427135642 [](ancestors = 427135642,426409875)

Using -1 as the "entire data view" is reasonable. But if @mengaims feels it is proper to keep it in the inherited class, I am okay with it since we only have one derived class using it now. If we get another inherited class, we can revisit this then and refactor if necessary. Since this is an internal implementation detail, there shouldn't be any issues.

In reply to: 427278034 [](ancestors = 427278034,427135642,426409875)

yaeldekel · 2020-05-18T07:10:29Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+
+        protected override Func<int, bool> GetSchemaBindingDependencies(Func<int, bool> predicate)
+        {
+            return (SchemaBindings as Bindings).GetDependencies(predicate);


SchemaBindings [](start = 20, length = 14)

Suggestion: Add a private field of type Bindings:

private readonly Bindings _bindings;

and then the override of SchemaBindings would be:

public override ColumnBindingsBase SchemaBindings => _bindings;

and this method would return _bindings.GetDependencies(predicate). #Resolved

Fixed.

In reply to: 426410915 [](ancestors = 426410915)

yaeldekel · 2020-05-18T07:12:23Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+            private readonly int _batchSize;
+            private readonly int _outputLength;
+            private SrCnnEntireModeler _modler;
+            private double[][] _results;


_results [](start = 31, length = 8)

Why not make this readonly? That way you don't have to allocate a new array for every batch. #Resolved

Due to the scenario that bachSize=-1, and we do not know the exact row count until we read in data, results could not be allocated when initialize Batch.

In reply to: 426411747 [](ancestors = 426411747)

yaeldekel · 2020-05-18T07:15:39Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+            }
+        }
+
+        public sealed class SrCnnEntireModeler


public [](start = 8, length = 6)

Can this be private? #Resolved

I agree. Ideally this should be private

In reply to: 426413214 [](ancestors = 426413214)

This class is similar as Batch class, used by protect methods as parameter in SrCnnBatchAnomalyDetector, if make them private, the accessasiblity will not be aligned and there will be compile errors. I've made them internal.

In reply to: 426828580 [](ancestors = 426828580,426413214)

yaeldekel · 2020-05-18T07:26:33Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+                _threshold = threshold;
+                _sensitivity = sensitivity;
+                _detectMode = detectMode;
+                _outputLength = outputLength;


_outputLength [](start = 16, length = 13)

_outputLength is determined by _detectMode, so you can either use the array in the wrapping class like Harish suggested, or check that _outputLength and _detectMode are consistent. (actually, if you make this a private class, then an assert instead of a check would be enough). #Resolved

Fixed.

In reply to: 426418482 [](ancestors = 426418482)

yaeldekel · 2020-05-18T07:27:33Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+
+            public double[][] Train(double[] values)
+            {
+                double[][] results = new double[values.Length][];


results [](start = 27, length = 7)

Pass in an array so you don't have to allocate a new one on every call. #Resolved

Reason explained above.

In reply to: 426418976 [](ancestors = 426418976)

In that case, then should it be a ref input to the method? that way you allocate a new array only if it's null.
There are two cases: when batchSize=-1, and when batchSize>0. In the first case, you allocate only once anyway since you call this method only after the entire pass over the data. In the second case, you would allocate new arrays for each batch, wouldn't it improve perf to only allocate once?

In reply to: 427138599 [](ancestors = 427138599,426418976)

Fixed #Resolved

yaeldekel · 2020-05-18T07:28:29Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+                return results;
+            }
+
+            private static void SpecturalResidual(double[] values, double[][] results, double threshold)


SpecturalResidual [](start = 32, length = 17)

Should this be SpectralResidual? #Resolved

fixed.

In reply to: 426419496 [](ancestors = 426419496)

yaeldekel · 2020-05-18T07:30:15Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+
+                // Step 2: FFT transformation
+                int length = backAddList.Length;
+                double[] fftRe = new double[length];


fftRe [](start = 25, length = 5)

Is there a way to make these fields of the class so they are only allocated once? #Resolved

I would make this class contain buffers for all these computations to avoid multiple allocations of arrays. This would mean that the SrCnnBatchAnomalyDetector cannot have an SrcEntireModeler as a field, and the InitializeBatch method would have to instantiate it (IDataVews should be thread safe, since it should be possible to create multiple row cursors on multiple threads from an IDataView object).

In reply to: 426420411 [](ancestors = 426420411)

I think the problem here is the same as results array, due to the batchSizxe=-1 scenario, we will not know the length of these arrays until we actually read in data.

In reply to: 426431891 [](ancestors = 426431891,426420411)

Allocating a bunch of arrays for each batch might have a perf price, with the GC going into overdrive. Consider allocating them only once and re-using them for subsequent batches.

In reply to: 427139611 [](ancestors = 427139611,426431891,426420411)

If every instance of SrEntireModeler contained preallocated buffers, would the result not be the same (since InitializeBatch would then have to allocate a new instance of SrEntireModeler)? #Resolved

No, because InitializeBatch is called only once, when the row cursor is created (see line 100 in BatchDataViewMapperBase.cs). The SrEntireModeler methods (Train and the helper methods that are called from it) are called after every batch is read from the input cursor (see line 168 in BatchDataViewMapperBase.cs).

In reply to: 427614415 [](ancestors = 427614415)

Perhaps InitializeBatch should be renamed CreateBatch to avoid confusion?

In reply to: 427764070 [](ancestors = 427764070,427614415)

done.

In reply to: 427764306 [](ancestors = 427764306,427764070,427614415)

fixed.

In reply to: 427285088 [](ancestors = 427285088,427139611,426431891,426420411)

yaeldekel · 2020-05-18T07:58:37Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+                return cumSumList;
+            }
+
+            private static double CalculateSocre(double mag, double avgMag)


Soc [](start = 43, length = 3)

Typo. #Resolved

Fixed.

In reply to: 426435676 [](ancestors = 426435676)

yaeldekel · 2020-05-18T08:00:26Z

src/Microsoft.ML.TimeSeries/SrCnnBatchAnomalyDetection.cs

+using System.Linq;
+using Microsoft.ML.Data;
+using Microsoft.ML.Data.DataView;
+using Microsoft.ML.Numeric;


Numeric [](start = 19, length = 7)

Is this used? #Resolved

removed.

In reply to: 426436746 [](ancestors = 426436746)

yaeldekel · 2020-05-18T08:03:27Z

src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs

+        /// ]]>
+        /// </format>
+        /// </example>
+        public static IDataView BatchDetectAnomalyBySrCnn(this DataOperationsCatalog catalog, IDataView input, string outputColumnName, string inputColumnName,


BatchDetectAnomalyBySrCnn [](start = 32, length = 25)

Can this be renamed DetectAnomaliesBy...? #Resolved

yaeldekel · 2020-05-18T08:07:13Z

test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs

+            IDataView dataView;
+            if (loadDataFromFile)
+            {
+                var dataPath = GetDataPath(Path.Combine("Timeseries", "anomaly_detection.csv"));


Path.Combine( [](start = 43, length = 13)

I think there is an overload of GetDataPath which does Path.Combine for you. #Resolved

done.

In reply to: 426440599 [](ancestors = 426440599)

yaeldekel · 2020-05-18T08:08:29Z

test/Microsoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs

+        {
+            var ml = new MLContext(1);
+            IDataView dataView;
+            if (loadDataFromFile)


loadDataFromFile [](start = 16, length = 16)

Why would there be a difference? Does that test different functionality of the new API? #Resolved

This is required by Harish to test both loading dataview from file and from List

In reply to: 426441364 [](ancestors = 426441364)

In one of the earlier editions of the API (when it used some non-primitive data types) I thought I saw some issues with whether the API would support loading from files cleanly. So I had asked for this test. It may not be required now. But if it is here already, it is is okay to leave it as is.

In reply to: 426707236 [](ancestors = 426707236,426441364)

yaeldekel · 2020-05-20T08:59:09Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                else if (results.Length > values.Length)
+                {
+                    Array.Resize<double[]>(ref results, values.Length);
+                }


Why do you need to do this? #Resolved

The last batch may have different size. We only borrows data when the last batch has less than 12 points, when it has more than 12, we will use the length as it is.

In reply to: 427851487 [](ancestors = 427851487)

yaeldekel · 2020-05-20T09:01:58Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+
+            private double[] BackAdd(double[] data)
+            {
+                AllocateDoubleArray(ref _predictArray, _lookaheadWindowSize + 1);


_predictArray [](start = 40, length = 13)

This is always allocated to be the same size, and doesn't depend on the batch size. Can be readonly and instantiated in the constructor. #Resolved

done.

In reply to: 427853277 [](ancestors = 427853277)

yaeldekel · 2020-05-20T09:05:02Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                {
+                    arr = new double[length];
+                }
+                else if (arr.Length != length)


arr.Length != length [](start = 25, length = 20)

Similar question - why do you need this?
Is this because of the last batch that may be smaller than the rest? Can't you just ignore the last few elements in the arrays instead of resizing? (resizing allocates a new array). #Resolved

Explained in email

In reply to: 427855107 [](ancestors = 427855107)

yaeldekel · 2020-05-20T09:06:50Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+            private void SpectralResidual(double[] values, double[][] results, double threshold)
+            {
+                // Step 1: Get backadd wave
+                double[] backAddList = BackAdd(values);


BackAdd [](start = 39, length = 7)

I think this puts the values in _backAddArray, it doesn't need to return it. #Resolved

Fixed this and other similar places.

In reply to: 427856187 [](ancestors = 427856187)

yaeldekel · 2020-05-20T09:08:03Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                AllocateDoubleArray(ref _fftRe, length);
+                AllocateDoubleArray(ref _fftIm, length);
+
+                FftUtils.ComputeForwardFft(backAddList, Enumerable.Repeat((double)0.0f, length).ToArray(), _fftRe, _fftIm, length);


Enumerable.Repeat((double)0.0f, length).ToArray() [](start = 56, length = 49)

Can you store a buffer of zeros as a field? #Resolved

Done.

In reply to: 427856973 [](ancestors = 427856973)

yaeldekel · 2020-05-20T09:09:32Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                AllocateDoubleArray(ref _fftRe, length);
+                AllocateDoubleArray(ref _fftIm, length);
+
+                FftUtils.ComputeForwardFft(backAddList, Enumerable.Repeat((double)0.0f, length).ToArray(), _fftRe, _fftIm, length);


ComputeForwardFft [](start = 25, length = 17)

I think you might be able to change this method to take in Span<double> instead of arrays, this way you won't have to resize, you can just take a subspan of the arrays, and the arrays will always be the same length. #Resolved

Explained in email

In reply to: 427857936 [](ancestors = 427857936)

yaeldekel · 2020-05-20T09:17:32Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                    _previousBatch = _previousBatch.GetRange(_batch.Count, bLen);
+                    _previousBatch.AddRange(_batch);
+                    _modeler.Train(_previousBatch.ToArray(), ref _results);
+                    _results = _results.Skip(bLen).ToArray();


Skip(bLen).ToArray() [](start = 40, length = 20)

Suggestion: Instead of skipping, keep an int indicating the "beginning" of _results, so the getter would return the value in _results[input.Position % _batchSize + _bLen] instead of _results[input.Position % _batchSize]. #Resolved

Fixed as you suggested.

In reply to: 427863039 [](ancestors = 427863039)

yaeldekel · 2020-05-20T09:20:20Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                        for (int i = 0; i < _outputLength; ++i)
+                        {
+                            result.Values[i] = _results[input.Position % _batchSize][i];
+                        }


Shorter alternative:

_results[...].CopyTo(result.Values) ``` #Resolved

Fixed as you suggested.

In reply to: 427864940 [](ancestors = 427864940)

yaeldekel · 2020-05-20T09:21:56Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+            private void GetExpectedValue(double[] values, double[][] results)
+            {
+                //Step 8: Calculate Expected Value
+                var exps = CalculateExpectedValueByFft(GetDeanomalyData(values, GetAnomalyIndex(results.Select(x => x[1]).ToArray())));


CalculateExpectedValueByFft [](start = 27, length = 27)

Can return void, values are stored in a field. #Resolved

fixed.

In reply to: 427865834 [](ancestors = 427865834)

yaeldekel · 2020-05-20T09:24:27Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+            private void GetMargin(double[] values, double[][] results, double sensitivity)
+            {
+                //Step 8: Calculate Expected Value
+                var exps = CalculateExpectedValueByFft(GetDeanomalyData(values, GetAnomalyIndex(results.Select(x => x[1]).ToArray())));


CalculateExpectedValueByFft [](start = 27, length = 27)

Return void. #Resolved

Fixed this and other similar places.

In reply to: 427867351 [](ancestors = 427867351)

yaeldekel · 2020-05-20T09:24:34Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                var exps = CalculateExpectedValueByFft(GetDeanomalyData(values, GetAnomalyIndex(results.Select(x => x[1]).ToArray())));
+
+                //Step 9: Calculate Boundary Unit
+                var units = CalculateBoundaryUnit(values, results.Select(x => x[0] > 0).ToArray());


CalculateBoundaryUnit [](start = 28, length = 21)

Same. #Resolved

Fixed.

In reply to: 427867423 [](ancestors = 427867423)

yaeldekel · 2020-05-20T09:27:42Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                var units = CalculateBoundaryUnit(values, results.Select(x => x[0] > 0).ToArray());
+
+                //Step 10: Calculate UpperBound and LowerBound
+                var margins = units.Select(x => CalculateMargin(x, sensitivity)).ToList();


CalculateMargin [](start = 48, length = 15)

I would do this inside the for loop:

for (int i = 0; i < results.Length; ++i) { var margin = CalculateMargin(units[i], sensitivity); results[i][3] = exps[i]; results[i][4] = units[i]; results[i][5] = exps[i] + margin; results[i][6] = exps[i] - margin; //Step 11: Update Anomaly Score results[i][1] = CalculateAnomalyScore(values[i], exps[i], units[i], results[i][0] > 0); } ``` #Resolved

Done.

In reply to: 427869378 [](ancestors = 427869378)

yaeldekel · 2020-05-20T09:28:35Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+
+            private double[] GetDeanomalyData(double[] data, int[] anomalyIdxList)
+            {
+                double[] deAnomalyData = (double[])data.Clone();


double[] deAnomalyData = (double[])data.Clone(); [](start = 16, length = 48)

Can this be converted to a field similarly to the rest of the arrays this class uses? #Resolved

Fixed.

In reply to: 427869960 [](ancestors = 427869960)

harishsk

harishsk

yaeldekel · 2020-05-21T08:20:03Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                for (int i = 0; i < arr.Length; ++i)
+                {
+                    arr[i] = 0.0;
+                }


Values will be 0 by default. #Resolved

Since values are 0 by default, removed this fuction, use AllocateDoubleArray instead, they are equally.

In reply to: 428513049 [](ancestors = 428513049)

yaeldekel · 2020-05-21T08:26:28Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                if (_batch.Count < MinBatchSize)
+                {
+                    if (_previousBatch.Count + _batch.Count < MinBatchSize)
+                        return;


return [](start = 24, length = 6)

In what scenario does this happen?
What does it mean for the getter, doesn't it need to return something? It seems that if this happens it will just return the "old" values in _results. #Resolved

Changed the condition to _previousBatch.Count == 0 to make it easier to understand. This happens when batchSize is set to -1 but the input point count < 12. I changed the return to throw Exception here.
if (_previousBatch.Count == 0)
{
throw new InvalidOperationException("The input must contain no less than 12 points.");
}

In reply to: 428515960 [](ancestors = 428515960)

yaeldekel · 2020-05-21T17:26:14Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+                {
+                    if (_previousBatch.Count == 0)
+                    {
+                        throw new InvalidOperationException("The input must contain no less than 12 points.");


InvalidOperationException [](start = 34, length = 25)

throw Contracts.Except... #Resolved

Fixed #Resolved

yaeldekel · 2020-05-21T17:27:16Z

src/Microsoft.ML.TimeSeries/SrCnnEntireAnomalyDetector.cs

+
+using System;
+using System.Collections.Generic;
+using System.IO;


IO [](start = 13, length = 2)

Is this needed? #Resolved

System.io is not needed. Fixed #Resolved

yaeldekel

gvashishtha · 2020-05-22T04:44:43Z

src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs

+        /// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
+        /// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
+        /// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
+        /// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>


Definition of threshold refers to "score," which I don't see defined anywhere

klausmh and others added 4 commits May 15, 2020 11:57

Draft PR for SrCnn batch detection API interface (#1)

7ea51d8

* POC Batch transform * SrCnn batch interface * Removed comment * Handled some APIreview comments. * Handled other review comments. * Resolved review comments. Added sample. Co-authored-by: Yael Dekel <[email protected]>

Merge remote-tracking branch 'upstream/master' into timeseries/sr_ent…

8951997

…ire_function

Implement SrCnn entire API by function

1f8e1e0

Fix bugs and add test

b8351ad

mengaims commented May 15, 2020

View reviewed changes

harishsk reviewed May 15, 2020

View reviewed changes

yaeldekel reviewed May 18, 2020

View reviewed changes

yaeldekel self-requested a review May 18, 2020 07:35

yaeldekel reviewed May 18, 2020

View reviewed changes

yaeldekel reviewed May 20, 2020

View reviewed changes

Move modeler initializing to CreateBatch and other minor fix.

ab99eea

mengaims marked this pull request as ready for review May 21, 2020 02:11

mengaims requested a review from a team as a code owner May 21, 2020 02:11

harishsk approved these changes May 21, 2020

View reviewed changes

yaeldekel reviewed May 21, 2020

View reviewed changes

mengaims and others added 2 commits May 21, 2020 21:22

Fix 3 remaining comments

df08f45

Fixed code analysis issue.

81ae699

yaeldekel reviewed May 21, 2020

View reviewed changes

yaeldekel approved these changes May 21, 2020

View reviewed changes

Fixed minor comments

3355c4a

harishsk changed the title ~~Add SrCnn entire API by implementing funciton~~ Add SrCnn entire API by implementing function May 21, 2020

harishsk merged commit a4af0ec into dotnet:master May 21, 2020

gvashishtha reviewed May 22, 2020

View reviewed changes

ghost locked as resolved and limited conversation to collaborators Mar 18, 2022


		protected override Batch InitializeBatch(DataViewRowCursor input) => new Batch(_batchSize, _outputLength, _modler);

		protected override Func<bool> GetIsNewBatchDelegate(DataViewRowCursor input)

Add SrCnn entire API by implementing function #5135

Add SrCnn entire API by implementing function #5135

Uh oh!

Conversation

mengaims commented May 15, 2020

Uh oh!

mengaims May 15, 2020 • edited by harishsk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishsk May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klausmh May 15, 2020 • edited by harishsk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishsk May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishsk May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishsk May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaeldekel May 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaeldekel May 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaeldekel May 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaeldekel May 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaeldekel May 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mengaims May 15, 2020 •

edited by harishsk

Loading

harishsk May 15, 2020 •

edited

Loading

klausmh May 15, 2020 •

edited by harishsk

Loading

harishsk May 15, 2020 •

edited

Loading

harishsk May 15, 2020 •

edited

Loading

harishsk May 15, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited by harishsk

Loading

yaeldekel May 18, 2020 •

edited

Loading

yaeldekel May 18, 2020 •

edited by harishsk

Loading

yaeldekel May 18, 2020 •

edited by harishsk

Loading