Skip to content

IndexOutOfRange Exception in KeyToVector transformer #2681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Ivanidzo4ka
Copy link
Contributor

Fixes #2678

@Ivanidzo4ka Ivanidzo4ka changed the title Fix for 2678 IndexOutOfRange Exception in KeyToVector transformer Feb 21, 2019
@@ -573,7 +573,6 @@ private ValueGetter<VBuffer<float>> MakeGetterInd(DataViewRow input, int iinfo)
if (key >= (uint)size)
continue;
editor.Values[count] = 1;
editor.Indices[count++] = slot * size + (int)key;
Copy link
Member

@eerhardt eerhardt Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also appears that there are tests failing in CI with this change.

Maybe the right change is to use

/// <param name="requireIndicesOnDense">
/// True means to ensure the Indices buffer is available, even if the buffer will be dense.
/// </param>

@codecov
Copy link

codecov bot commented Feb 21, 2019

Codecov Report

Merging #2681 into master will decrease coverage by 1.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2681      +/-   ##
==========================================
- Coverage   72.78%   71.69%   -1.09%     
==========================================
  Files         808      810       +2     
  Lines      145588   142487    -3101     
  Branches    16250    16112     -138     
==========================================
- Hits       105964   102163    -3801     
- Misses      35202    35898     +696     
- Partials     4422     4426       +4
Flag Coverage Δ
#Debug 71.69% <100%> (-1.09%) ⬇️
#production 67.92% <100%> (-0.37%) ⬇️
#test 85.91% <100%> (-3.12%) ⬇️
Impacted Files Coverage Δ
...icrosoft.ML.Tests/Transformers/CategoricalTests.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Data/Transforms/Hashing.cs 92.04% <100%> (-0.29%) ⬇️
src/Microsoft.ML.Data/Transforms/KeyToVector.cs 82.07% <100%> (+0.25%) ⬆️
...soft.ML.Tests/Transformers/CategoricalHashTests.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Transforms/KeyToVectorMapping.cs 91.63% <100%> (+0.39%) ⬆️
...icrosoft.ML.Functional.Tests/Datasets/Sentiment.cs 0% <0%> (-100%) ⬇️
....ML.Functional.Tests/Datasets/HousingRegression.cs 14.28% <0%> (-85.72%) ⬇️
...rosoft.ML.Data/DataLoadSave/CompositeDataLoader.cs 28.57% <0%> (-63.1%) ⬇️
...ft.ML.StaticPipe/WordEmbeddingsStaticExtensions.cs 0% <0%> (-62.5%) ⬇️
src/Microsoft.ML.Transforms/ConversionsCatalog.cs 50% <0%> (-33.34%) ⬇️
... and 667 more

@TomFinley
Copy link
Contributor

TomFinley commented Mar 6, 2019

Hi @Ivanidzo4ka, is this relevant to #2185? It seems to be touching much of the same code I noted in that issue.

I mentioned you there but perhaps you did not see. At least, I do not see that there is a comment. Maybe you can review and see that all the issues I observed are actually fixed by this? I see some would be, but might be nice to be complete.

@@ -1273,7 +1273,7 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema)
metadata.Add(slotMeta);
if (colInfo.InvertHash != 0)
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.KeyValues, SchemaShape.Column.VectorKind.Vector, TextDataViewType.Instance, false));
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, col.ItemType is VectorType ? SchemaShape.Column.VectorKind.Vector : SchemaShape.Column.VectorKind.Scalar, NumberDataViewType.UInt32, true, new SchemaShape(metadata));
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, col.Kind, NumberDataViewType.UInt32, true, new SchemaShape(metadata));
}
Copy link
Member

@wschin wschin Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean input shape must be scalar? If yes, we need to throw when encountering a vector. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It means we accept vectors, scalars and varvectors


In reply to: 263166153 [](ancestors = 263166153)

@wschin
Copy link
Member

wschin commented Mar 6, 2019

                throw Host.ExceptSchemaMismatch(nameof(inputSchema), "input", colInfo.inputColumnName);

We don't we just check if the input is a key?


Refers to: src/Microsoft.ML.Transforms/KeyToVectorMapping.cs:461 in 540b139. [](commit_id = 540b139, deletion_comment = False)

@@ -466,7 +466,9 @@ public override SchemaShape GetOutputSchema(SchemaShape inputSchema)
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.SlotNames, SchemaShape.Column.VectorKind.Vector, keyMeta.ItemType, false));
if (col.Kind == SchemaShape.Column.VectorKind.Scalar)
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.IsNormalized, SchemaShape.Column.VectorKind.Scalar, BooleanDataViewType.Instance, false));
result[colInfo.outputColumnName] = new SchemaShape.Column(colInfo.outputColumnName, SchemaShape.Column.VectorKind.Vector, NumberDataViewType.Single, false, new SchemaShape(metadata));
result[colInfo.outputColumnName] = new SchemaShape.Column(colInfo.outputColumnName,
col.Kind == SchemaShape.Column.VectorKind.VariableVector ? SchemaShape.Column.VectorKind.VariableVector : SchemaShape.Column.VectorKind.Vector,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col [](start = 20, length = 3)

This is key-to-vector mapping. Can the input column be a vector-valued?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have vector of keys, right?


In reply to: 263169008 [](ancestors = 263169008)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact a vector of keys is one of the most important scenarios (e.g., bag of words).

@@ -1273,7 +1273,7 @@ public SchemaShape GetOutputSchema(SchemaShape inputSchema)
metadata.Add(slotMeta);
if (colInfo.InvertHash != 0)
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.KeyValues, SchemaShape.Column.VectorKind.Vector, TextDataViewType.Instance, false));
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, col.ItemType is VectorType ? SchemaShape.Column.VectorKind.Vector : SchemaShape.Column.VectorKind.Scalar, NumberDataViewType.UInt32, true, new SchemaShape(metadata));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col.ItemType is VectorType ? SchemaShape.Column.VectorKind.Vector : SchemaShape.Column.VectorKind [](start = 76, length = 97)

Geez, the original author sure got confused.

metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.SlotNames, SchemaShape.Column.VectorKind.Vector, keyMeta.ItemType, false));
if (!colInfo.Bag && (col.Kind == SchemaShape.Column.VectorKind.Scalar || col.Kind == SchemaShape.Column.VectorKind.Vector))
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.CategoricalSlotRanges, SchemaShape.Column.VectorKind.Vector, NumberDataViewType.Int32, false));
if (!colInfo.Bag || (col.Kind == SchemaShape.Column.VectorKind.Scalar))
metadata.Add(new SchemaShape.Column(AnnotationUtils.Kinds.IsNormalized, SchemaShape.Column.VectorKind.Scalar, BooleanDataViewType.Instance, false));

result[colInfo.Name] = new SchemaShape.Column(colInfo.Name, SchemaShape.Column.VectorKind.Vector, NumberDataViewType.Single, false, new SchemaShape(metadata));
result[colInfo.Name] = new SchemaShape.Column(colInfo.Name,
col.Kind == SchemaShape.Column.VectorKind.VariableVector && !colInfo.Bag ? SchemaShape.Column.VectorKind.VariableVector : SchemaShape.Column.VectorKind.Vector,
Copy link
Contributor

@TomFinley TomFinley Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col.Kind == SchemaShape.Column.VectorKind.VariableVector && !colInfo.Bag ? SchemaShape.Column.VectorKind.VariableVector : SchemaShape.Column.VectorKind.Vector [](start = 20, length = 158)

Um... yeah.if you wanted to explode out this condition a bit with a bunch of clarifying comments I wouldn't object to this becoming more than a one-line statement. I found this slightly hard to unpack. (Correct, I think, just hard to unpack.)

@TomFinley
Copy link
Contributor

Hi @Ivanidzo4ka, is this relevant to #2185? It seems to be touching much of the same code I noted in that issue.

Hi, @Ivanidzo4ka, wondering if you've had a chance to give this any thought or attention.

@eerhardt
Copy link
Member

eerhardt commented May 3, 2019

@Ivanidzo4ka - what's the status of this PR? Can it be updated?

@Ivanidzo4ka
Copy link
Contributor Author

@eerhardt I've update it.

@@ -1,14 +0,0 @@
#@ TextLoader{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the SingleDebug versions of these two baseline files be deleted as well?

new InputOutputColumnPair("TermA", "A"),
new InputOutputColumnPair("TermB", "B"),
new InputOutputColumnPair("TermC", "C")
new OneHotEncodingEstimator.ColumnOptions("CatA", "A"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OneHotEncodingEstimator.ColumnOptions is an internal class. Can we instead write the tests using the public API?

@codemzs
Copy link
Member

codemzs commented May 22, 2019

closing this PR since #3763 has these changes in it. I had initially created that PR not knowing there was this PR but then @yaeldekel pointed out this PR also takes care of output schema so I took those changes. I need to check-in my PR as it is blocking 1P customer.

@codemzs codemzs closed this May 22, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unintuitive error when giving array to OneHotEncoder
5 participants