-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add SrCnn entire API by implementing function #5135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SrCnn entire API by implementing function #5135
Conversation
* POC Batch transform * SrCnn batch interface * Removed comment * Handled some APIreview comments. * Handled other review comments. * Resolved review comments. Added sample. Co-authored-by: Yael Dekel <[email protected]>
if (batchSize == -1) | ||
{ | ||
_batchSize = (int)input.GetRowCount(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the transformer version, we allow batchSize to be -1, means using the whole data set as one batch. Here I try to get row count from input when batchSize=-1 and set an actual positive batchSize. But I found that if we load dataview by text file loader, we could not get this count by this way. Is there any other way to achieve this? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ML.NET does lazy evaluation of most things. So the textloader may not know the number of rows ahead of time. When batchSize is -1, you will be computing the anomaly on the whole file and so you are going to read the whole file before doing your computation. You will need to pass the -1 value of _batchSize further and calculate the actual row count after you have read all the data and before computation begins.
In reply to: 425803839 [](ancestors = 425803839)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I commented the same thing above before I noticed your comment.
If you need the number of rows in the data, you can compute it here in the ctor, by getting a row cursor with no active columns and counting how many times MoveNext()
returns true. So you can do something like this:
var rowCount = input.GetRowCount();
if (rowCount!=null)
_batchSize = rowCount;
else
{
_batchSize = 0;
using (var cursor = input.GetRowCursor())
{
while (cursor.MoveNext())
_batchSize++;
}
}
In reply to: 426029554 [](ancestors = 426029554,425803839)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, you would need to check for overflow in this case, since the row count can be greater than int.MaxValue
.
In reply to: 426407146 [](ancestors = 426407146,426029554,425803839)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concern about count batchSize here since this may introduce an extra traversing. Could we allow batchSize=-1 when initialize Batch object, and in GetIsNewBatchDelegate and GetLastInBatchDelegate, add a special condition for _batchSize=1? In Batch class, since data will be read in later, there is no need to get batchSize from outside when no batch is actually used.
In reply to: 426407783 [](ancestors = 426407783,426407146,426029554,425803839)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, please check.
In reply to: 426719317 [](ancestors = 426719317,426407783,426407146,426029554,425803839)
/// </format> | ||
/// </example> | ||
public static IDataView BatchDetectAnomalyBySrCnn(this DataOperationsCatalog catalog, IDataView input, string outputColumnName, string inputColumnName, | ||
double threshold = 0.3, int batchSize = 1024, double sensitivity = 99, SrCnnDetectMode detectMode = SrCnnDetectMode.AnomalyOnly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be in the TimeSeries catalog? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the suggestion from @yaeldekel to put it here #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaeldekel, Can you please elaborate? We already have a TimeSeries catalog. Shouldn't it belong there?
In reply to: 426065077 [](ancestors = 426065077)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In previous iterations it was in the TransformsCatalog
which is why I suggested moving it. TimeSeriesCatalog
would also work.
In reply to: 426078156 [](ancestors = 426078156,426065077)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that RCA used a newly defined AnomalyDetectionCatalog, maybe it would be better to align with it?
In reply to: 426439074 [](ancestors = 426439074,426078156,426065077)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Please move it to the AnomalyDetectionCatalog
In reply to: 427140758 [](ancestors = 427140758,426439074,426078156,426065077)
private readonly string _inputColumnName; | ||
private readonly int _outputLength; | ||
private readonly SrCnnEntireModeler _modler; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: _modeler #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
_outputLength = AnomalyOnlyOutputLength; | ||
_modler = new SrCnnEntireModeler(threshold, sensitivity, detectMode, _outputLength); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This line is common to all the if blocks and can be moved out of the if blocks. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private const int AnomalyOnlyOutputLength = 3; | ||
private const int AnomalyAndExpectedValueOutputLength = 4; | ||
private const int AnomalyAndMarginOutputLength = 7; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: You can put this in an array to match the index of SrCnnDetectMode and reference the array when initializing the value of _outputLength in the constructor #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
public SrCnnBatchAnomalyDetector(IHostEnvironment env, IDataView input, string inputColumnName, string outputColumnName, double threshold, int batchSize, double sensitivity, SrCnnDetectMode detectMode) | ||
: base(env, "SrCnnBatchAnomalyDetector", input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SrCnnBatchAnomalyDetector [](start = 25, length = 25)
Please use nameof(SrCnnBatchAnomalyDetector)
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public DataViewSchema Schema => SchemaBindings.AsSchema; | ||
|
||
private readonly IDataView _source; | ||
private readonly IHost _host; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_host [](start = 31, length = 5)
Make this protected, and use in the deriving class. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public SrCnnBatchAnomalyDetector(IHostEnvironment env, IDataView input, string inputColumnName, string outputColumnName, double threshold, int batchSize, double sensitivity, SrCnnDetectMode detectMode) | ||
: base(env, "SrCnnBatchAnomalyDetector", input) | ||
{ | ||
Contracts.CheckValue(env, nameof(env)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CheckValue [](start = 22, length = 10)
This check should be done in the base class. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
Contracts.CheckValue(env, nameof(env)); | ||
|
||
Contracts.CheckValue(inputColumnName, nameof(inputColumnName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contracts [](start = 12, length = 9)
The IHost
in the base class should be protected, and it should be used here. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contracts.CheckValue(inputColumnName, nameof(inputColumnName)); | ||
_inputColumnName = inputColumnName; | ||
|
||
env.CheckUserArg(batchSize == -1 || batchSize >= MinBatchSize, nameof(batchSize), "BatchSize must be -1 or no less than 12."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env [](start = 12, length = 3)
Use IHost
from base class. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env.CheckUserArg(batchSize == -1 || batchSize >= MinBatchSize, nameof(batchSize), "BatchSize must be -1 or no less than 12."); | ||
if (batchSize == -1) | ||
{ | ||
_batchSize = (int)input.GetRowCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetRowCount [](start = 40, length = 11)
That will most likely not work, since most implementations of IDataView
return null
for this method. #Resolved
|| detectMode == SrCnnDetectMode.AnomalyAndExpectedValue | ||
|| detectMode == SrCnnDetectMode.AnomalyAndMargin, nameof(detectMode), "Invalid detectMode"); | ||
|
||
if (detectMode.Equals(SrCnnDetectMode.AnomalyOnly)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnomalyOnly [](start = 50, length = 11)
Since this is an enum, this can be done using a switch
statement. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented above, you don't need an if or a switch statement if you store the possible values for _outputLength in an const array whose indices match the the enum values.
In reply to: 426408554 [](ancestors = 426408554)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
protected override Batch InitializeBatch(DataViewRowCursor input) => new Batch(_batchSize, _outputLength, _modler); | ||
|
||
protected override Func<bool> GetIsNewBatchDelegate(DataViewRowCursor input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetIsNewBatchDelegate [](start = 38, length = 21)
I think this method and GetLastInBatchDelegate
can be implemented in the base class, I can't imagine a future implementation of the base class that would use a different way of computing this. @harishsk, what do you think? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a check for _batchSize==-1 is added here, I think it's proper to keep it in inherit class
In reply to: 426409875 [](ancestors = 426409875)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using -1 as the "entire data view" is reasonable. But if @mengaims feels it is proper to keep it in the inherited class, I am okay with it since we only have one derived class using it now. If we get another inherited class, we can revisit this then and refactor if necessary. Since this is an internal implementation detail, there shouldn't be any issues.
In reply to: 427278034 [](ancestors = 427278034,427135642,426409875)
|
||
protected override Func<int, bool> GetSchemaBindingDependencies(Func<int, bool> predicate) | ||
{ | ||
return (SchemaBindings as Bindings).GetDependencies(predicate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaBindings [](start = 20, length = 14)
Suggestion: Add a private field of type Bindings
:
private readonly Bindings _bindings;
and then the override of SchemaBindings
would be:
public override ColumnBindingsBase SchemaBindings => _bindings;
and this method would return _bindings.GetDependencies(predicate)
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private readonly int _batchSize; | ||
private readonly int _outputLength; | ||
private SrCnnEntireModeler _modler; | ||
private double[][] _results; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_results [](start = 31, length = 8)
Why not make this readonly? That way you don't have to allocate a new array for every batch. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the scenario that bachSize=-1, and we do not know the exact row count until we read in data, results could not be allocated when initialize Batch.
In reply to: 426411747 [](ancestors = 426411747)
} | ||
} | ||
|
||
public sealed class SrCnnEntireModeler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
Can this be private? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is similar as Batch class, used by protect methods as parameter in SrCnnBatchAnomalyDetector, if make them private, the accessasiblity will not be aligned and there will be compile errors. I've made them internal.
In reply to: 426828580 [](ancestors = 426828580,426413214)
_threshold = threshold; | ||
_sensitivity = sensitivity; | ||
_detectMode = detectMode; | ||
_outputLength = outputLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_outputLength [](start = 16, length = 13)
_outputLength
is determined by _detectMode
, so you can either use the array in the wrapping class like Harish suggested, or check that _outputLength
and _detectMode
are consistent. (actually, if you make this a private class, then an assert instead of a check would be enough). #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
public double[][] Train(double[] values) | ||
{ | ||
double[][] results = new double[values.Length][]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
results [](start = 27, length = 7)
Pass in an array so you don't have to allocate a new one on every call. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, then should it be a ref
input to the method? that way you allocate a new array only if it's null.
There are two cases: when batchSize=-1
, and when batchSize>0
. In the first case, you allocate only once anyway since you call this method only after the entire pass over the data. In the second case, you would allocate new arrays for each batch, wouldn't it improve perf to only allocate once?
In reply to: 427138599 [](ancestors = 427138599,426418976)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed #Resolved
return results; | ||
} | ||
|
||
private static void SpecturalResidual(double[] values, double[][] results, double threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpecturalResidual [](start = 32, length = 17)
Should this be SpectralResidual
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// Step 2: FFT transformation | ||
int length = backAddList.Length; | ||
double[] fftRe = new double[length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fftRe [](start = 25, length = 5)
Is there a way to make these fields of the class so they are only allocated once? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this class contain buffers for all these computations to avoid multiple allocations of arrays. This would mean that the SrCnnBatchAnomalyDetector
cannot have an SrcEntireModeler
as a field, and the InitializeBatch
method would have to instantiate it (IDataVew
s should be thread safe, since it should be possible to create multiple row cursors on multiple threads from an IDataView
object).
In reply to: 426420411 [](ancestors = 426420411)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is the same as results array, due to the batchSizxe=-1 scenario, we will not know the length of these arrays until we actually read in data.
In reply to: 426431891 [](ancestors = 426431891,426420411)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating a bunch of arrays for each batch might have a perf price, with the GC going into overdrive. Consider allocating them only once and re-using them for subsequent batches.
In reply to: 427139611 [](ancestors = 427139611,426431891,426420411)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If every instance of SrEntireModeler
contained preallocated buffers, would the result not be the same (since InitializeBatch
would then have to allocate a new instance of SrEntireModeler
)? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because InitializeBatch
is called only once, when the row cursor is created (see line 100 in BatchDataViewMapperBase.cs). The SrEntireModeler
methods (Train
and the helper methods that are called from it) are called after every batch is read from the input cursor (see line 168 in BatchDataViewMapperBase.cs).
In reply to: 427614415 [](ancestors = 427614415)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps InitializeBatch
should be renamed CreateBatch
to avoid confusion?
In reply to: 427764070 [](ancestors = 427764070,427614415)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return cumSumList; | ||
} | ||
|
||
private static double CalculateSocre(double mag, double avgMag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Soc [](start = 43, length = 3)
Typo. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using System.Linq; | ||
using Microsoft.ML.Data; | ||
using Microsoft.ML.Data.DataView; | ||
using Microsoft.ML.Numeric; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numeric [](start = 19, length = 7)
Is this used? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// ]]> | ||
/// </format> | ||
/// </example> | ||
public static IDataView BatchDetectAnomalyBySrCnn(this DataOperationsCatalog catalog, IDataView input, string outputColumnName, string inputColumnName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BatchDetectAnomalyBySrCnn [](start = 32, length = 25)
Can this be renamed DetectAnomaliesBy...
? #Resolved
IDataView dataView; | ||
if (loadDataFromFile) | ||
{ | ||
var dataPath = GetDataPath(Path.Combine("Timeseries", "anomaly_detection.csv")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path.Combine( [](start = 43, length = 13)
I think there is an overload of GetDataPath
which does Path.Combine
for you. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
var ml = new MLContext(1); | ||
IDataView dataView; | ||
if (loadDataFromFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadDataFromFile [](start = 16, length = 16)
Why would there be a difference? Does that test different functionality of the new API? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required by Harish to test both loading dataview from file and from List
In reply to: 426441364 [](ancestors = 426441364)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In one of the earlier editions of the API (when it used some non-primitive data types) I thought I saw some issues with whether the API would support loading from files cleanly. So I had asked for this test. It may not be required now. But if it is here already, it is is okay to leave it as is.
In reply to: 426707236 [](ancestors = 426707236,426441364)
else if (results.Length > values.Length) | ||
{ | ||
Array.Resize<double[]>(ref results, values.Length); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to do this? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last batch may have different size. We only borrows data when the last batch has less than 12 points, when it has more than 12, we will use the length as it is.
In reply to: 427851487 [](ancestors = 427851487)
|
||
private double[] BackAdd(double[] data) | ||
{ | ||
AllocateDoubleArray(ref _predictArray, _lookaheadWindowSize + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_predictArray [](start = 40, length = 13)
This is always allocated to be the same size, and doesn't depend on the batch size. Can be readonly and instantiated in the constructor. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
arr = new double[length]; | ||
} | ||
else if (arr.Length != length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arr.Length != length [](start = 25, length = 20)
Similar question - why do you need this?
Is this because of the last batch that may be smaller than the rest? Can't you just ignore the last few elements in the arrays instead of resizing? (resizing allocates a new array). #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private void SpectralResidual(double[] values, double[][] results, double threshold) | ||
{ | ||
// Step 1: Get backadd wave | ||
double[] backAddList = BackAdd(values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BackAdd [](start = 39, length = 7)
I think this puts the values in _backAddArray
, it doesn't need to return it. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllocateDoubleArray(ref _fftRe, length); | ||
AllocateDoubleArray(ref _fftIm, length); | ||
|
||
FftUtils.ComputeForwardFft(backAddList, Enumerable.Repeat((double)0.0f, length).ToArray(), _fftRe, _fftIm, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enumerable.Repeat((double)0.0f, length).ToArray() [](start = 56, length = 49)
Can you store a buffer of zeros as a field? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllocateDoubleArray(ref _fftRe, length); | ||
AllocateDoubleArray(ref _fftIm, length); | ||
|
||
FftUtils.ComputeForwardFft(backAddList, Enumerable.Repeat((double)0.0f, length).ToArray(), _fftRe, _fftIm, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComputeForwardFft [](start = 25, length = 17)
I think you might be able to change this method to take in Span<double>
instead of arrays, this way you won't have to resize, you can just take a subspan of the arrays, and the arrays will always be the same length. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_previousBatch = _previousBatch.GetRange(_batch.Count, bLen); | ||
_previousBatch.AddRange(_batch); | ||
_modeler.Train(_previousBatch.ToArray(), ref _results); | ||
_results = _results.Skip(bLen).ToArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip(bLen).ToArray() [](start = 40, length = 20)
Suggestion: Instead of skipping, keep an int
indicating the "beginning" of _results
, so the getter would return the value in _results[input.Position % _batchSize + _bLen]
instead of _results[input.Position % _batchSize]
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int i = 0; i < _outputLength; ++i) | ||
{ | ||
result.Values[i] = _results[input.Position % _batchSize][i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shorter alternative:
_results[...].CopyTo(result.Values)
``` #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private void GetExpectedValue(double[] values, double[][] results) | ||
{ | ||
//Step 8: Calculate Expected Value | ||
var exps = CalculateExpectedValueByFft(GetDeanomalyData(values, GetAnomalyIndex(results.Select(x => x[1]).ToArray()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalculateExpectedValueByFft [](start = 27, length = 27)
Can return void, values are stored in a field. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private void GetMargin(double[] values, double[][] results, double sensitivity) | ||
{ | ||
//Step 8: Calculate Expected Value | ||
var exps = CalculateExpectedValueByFft(GetDeanomalyData(values, GetAnomalyIndex(results.Select(x => x[1]).ToArray()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalculateExpectedValueByFft [](start = 27, length = 27)
Return void. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var exps = CalculateExpectedValueByFft(GetDeanomalyData(values, GetAnomalyIndex(results.Select(x => x[1]).ToArray()))); | ||
|
||
//Step 9: Calculate Boundary Unit | ||
var units = CalculateBoundaryUnit(values, results.Select(x => x[0] > 0).ToArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalculateBoundaryUnit [](start = 28, length = 21)
Same. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var units = CalculateBoundaryUnit(values, results.Select(x => x[0] > 0).ToArray()); | ||
|
||
//Step 10: Calculate UpperBound and LowerBound | ||
var margins = units.Select(x => CalculateMargin(x, sensitivity)).ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalculateMargin [](start = 48, length = 15)
I would do this inside the for loop:
for (int i = 0; i < results.Length; ++i)
{
var margin = CalculateMargin(units[i], sensitivity);
results[i][3] = exps[i];
results[i][4] = units[i];
results[i][5] = exps[i] + margin;
results[i][6] = exps[i] - margin;
//Step 11: Update Anomaly Score
results[i][1] = CalculateAnomalyScore(values[i], exps[i], units[i], results[i][0] > 0);
}
``` #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
private double[] GetDeanomalyData(double[] data, int[] anomalyIdxList) | ||
{ | ||
double[] deAnomalyData = (double[])data.Clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double[] deAnomalyData = (double[])data.Clone(); [](start = 16, length = 48)
Can this be converted to a field similarly to the rest of the arrays this class uses? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int i = 0; i < arr.Length; ++i) | ||
{ | ||
arr[i] = 0.0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values will be 0 by default. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since values are 0 by default, removed this fuction, use AllocateDoubleArray instead, they are equally.
In reply to: 428513049 [](ancestors = 428513049)
if (_batch.Count < MinBatchSize) | ||
{ | ||
if (_previousBatch.Count + _batch.Count < MinBatchSize) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return [](start = 24, length = 6)
In what scenario does this happen?
What does it mean for the getter, doesn't it need to return something? It seems that if this happens it will just return the "old" values in _results
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the condition to _previousBatch.Count == 0 to make it easier to understand. This happens when batchSize is set to -1 but the input point count < 12. I changed the return to throw Exception here.
if (_previousBatch.Count == 0)
{
throw new InvalidOperationException("The input must contain no less than 12 points.");
}
In reply to: 428515960 [](ancestors = 428515960)
{ | ||
if (_previousBatch.Count == 0) | ||
{ | ||
throw new InvalidOperationException("The input must contain no less than 12 points."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvalidOperationException [](start = 34, length = 25)
throw Contracts.Except...
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed #Resolved
|
||
using System; | ||
using System.Collections.Generic; | ||
using System.IO; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IO [](start = 13, length = 2)
Is this needed? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System.io is not needed. Fixed #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>. | ||
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param> | ||
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param> | ||
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definition of threshold refers to "score," which I don't see defined anywhere
We are excited to review your PR.
So we can do the best job, please check:
Fixes #nnnn
in your description to cause GitHub to automatically close the issue(s) when your PR is merged.