Expression estimator/transformer #4548

yaeldekel · 2019-12-08T12:34:36Z

Fixes #4015 .

codecov · 2019-12-08T13:36:23Z

Codecov Report

Merging #4548 into master will increase coverage by 0.35%.
The diff coverage is 83.08%.

@@            Coverage Diff             @@
##           master    #4548      +/-   ##
==========================================
+ Coverage   75.12%   75.48%   +0.35%     
==========================================
  Files         913      934      +21     
  Lines      160855   167656    +6801     
  Branches    17303    18129     +826     
==========================================
+ Hits       120848   126550    +5702     
- Misses      35165    36067     +902     
- Partials     4842     5039     +197

Flag	Coverage Δ
#Debug	`75.48% <83.08%> (+0.35%)`	⬆️
#production	`71.12% <82.55%> (+0.61%)`	⬆️
#test	`90.3% <89.35%> (ø)`	⬆️

Impacted Files	Coverage Δ
...L.Tests/Transformers/ExpressionTransformerTests.cs	`100% <ø> (ø)`
...icrosoft.ML.TestFramework/DataPipe/TestDataPipe.cs	`91.97% <100%> (+0.7%)`	⬆️
src/Microsoft.ML.Transforms/Expression/Error.cs	`100% <100%> (ø)`
src/Microsoft.ML.Transforms/ExpressionCatalog.cs	`100% <100%> (ø)`
src/Microsoft.ML.Transforms/Expression/Printer.cs	`32.03% <32.03%> (ø)`
src/Microsoft.ML.Transforms/Expression/TokKind.cs	`32.25% <32.25%> (ø)`
src/Microsoft.ML.Transforms/Expression/Exec.cs	`57.14% <57.14%> (ø)`
...Microsoft.ML.Transforms/Expression/LexCharUtils.cs	`58.87% <58.87%> (ø)`
src/Microsoft.ML.Transforms/Expression/Lexer.cs	`61.58% <61.58%> (ø)`
src/Microsoft.ML.Transforms/Expression/Tokens.cs	`68.49% <68.49%> (ø)`
... and 40 more

justinormont · 2019-12-23T17:30:47Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Expression.cs

+
+            // A pipeline that applies various expressions to the input columns.
+            var pipeline = mlContext.Transforms.Expression("Expr1", "(x,y)=>log(y)+x", "Float", "FloatVector")
+                .Append(mlContext.Transforms.Expression("Expr2", "(b,s,i)=>b ? len(s) : i", "Boolean", "StringVector", "Int"))


When first reading this I thought (incorrectly) we had to note the data types for inputs/outputs. Can we use non-datatypes (eg. HasSiblings) or longer names (eg. BooleanExampleColumn)?

Suggested change

.Append(mlContext.Transforms.Expression("Expr2", "(b,s,i)=>b ? len(s) : i", "Boolean", "StringVector", "Int"))

.Append(mlContext.Transforms.Expression("Expr2", "(b,s,i)=>b ? len(s) : i", "HasSiblings", "PreferredGreetings", "Age"))

``` #Resolved

Thanks for fixing
#resolved #Resolved

justinormont · 2019-12-23T17:40:27Z

src/Microsoft.ML.Transforms/Expression/BuiltinFunctions.cs

+            if (a == 0)
+            {
+                if (b == 0)
+                    return 1;


Seems inline with other implementations.
Had to look up: https://en.wikipedia.org/wiki/Zero_to_the_power_of_zero#IEEE_floating-point_standard #Resolved

This is documented in the md file (line 57). Do you think it needs to be made clearer?

In reply to: 360956933 [](ancestors = 360956933)

No, looks good to me.
Mainly was noting so others don't have to look up the normal output for 0^0. #Resolved

justinormont · 2019-12-23T17:55:40Z

src/Microsoft.ML.Transforms/Expression/BuiltinFunctions.cs

+            }
+            catch (OverflowException)
+            {
+                throw Contracts.Except("Overflow");


Would we be better with saturation arithmetic instead of throwing on overflow? Might be more of a PM-ish question of: is it better for a ML package to throw on semi-bad input or should we just do our best? I'm worried that this will throw in production when a model receives an unexpectedly large input (ML data is dirty).

Suggested change

throw Contracts.Except("Overflow");

res = (neg ? Int64.MinValue : Int64.MaxValue);

justinormont · 2019-12-23T18:01:25Z

src/Microsoft.ML.Transforms/Expression/CharCursor.cs

+            // If someone is peeking at ich, they should have peeked everything up to ich.
+            Contracts.Assert(0 < dich && dich <= _ichLim - _ichNext + 1);
+
+            int ich = dich + _ichNext - 1;


Are our variables names in German?

"ich" => "I"

"dich" => "you"

Nothing to fix. Just an interesting note.

justinormont · 2019-12-23T18:37:26Z

src/Microsoft.ML.Transforms/Expression/TokKind.cs

+        From,
+        All,
+        Where,
+        Convolve,


Are the Net# tokens being kept for internal compatibility? Otherwise we could remove the non-used tokens.

Resolved in follow-up PR: https://github.com/dotnet/machinelearning/pull/4614/files#r362319272

#Resolved

justinormont · 2019-12-23T19:38:10Z

test/Microsoft.ML.Tests/Transformers/ExpressionTransformerTests.cs

+            Assert.Equal(7, transformed.Schema["Expr6"].Type.GetValueCount());
+        }
+    }
+}


Thanks for the riveting 406 page PR! Truly epic.

justinormont · 2019-12-23T19:47:09Z

src/Microsoft.ML.Transforms/Expression/BuiltinFunctions.cs

+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        public static R4 Exp(R4 a)
+        {
+            return (R4)Math.Exp(a);


How difficult would it be to output an ONNX graph instead of (or in addition to) building up .NET?

Background:
The AutoML team is looking to support basic math functionality like: LN(x), x-y, and EXP(x). They could use the expression transform, though they need ONNX support. I'm hoping to avoid creating three new transforms for these operations by using the expression transform. This will avoid an endlessly growling list of tiny, single function, math transforms.

Issue tracked: #4615

justinormont · 2019-12-23T19:51:26Z

docs/api-reference/expression-estimator.md

+
+## Operators
+
+The operators of the expression language are listed in the following table, in precendence order. Unless otherwise noted,


We should mention how the expression handles vector columns. Hopefully we still operate on them :)

codemzs · 2019-12-26T18:25:36Z

docs/api-reference/expression-estimator.md

+
+| **Name** |  **Meaning** |  **Comments** |
+| --- | --- | --- |
+| bool | convert to BL | The operand must be text or boolean. |


BL [](start = 22, length = 2)

we don't have these types in our public API I think.

Resolved in follow-up PR: https://github.com/dotnet/machinelearning/pull/4614/files#r362319131

#Resolved

codemzs · 2019-12-26T18:27:17Z

src/Microsoft.ML.Transforms/ExpressionCatalog.cs

+        /// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.
+        /// This column's data type will be the same as that of the input column.</param>
+        /// <param name="expression">The expression to apply to <paramref name="inputColumnNames"/> to create the column <paramref name="outputColumnName"/>.</param>
+        /// <param name="inputColumnNames">The names of the input columns.</param>


/// The names of the input columns. [](start = 7, length = 75)

please add link to the sample

Resolved in follow-up PR: https://github.com/dotnet/machinelearning/pull/4614/files#r362319272

#Resolved

codemzs

yaeldekel requested a review from a team as a code owner December 8, 2019 12:34

yaeldekel requested a review from a team as a code owner December 23, 2019 14:00

yaeldekel force-pushed the expr branch from a78ea79 to 25587cf Compare December 23, 2019 14:09

justinormont reviewed Dec 23, 2019

View reviewed changes

yaeldMS added 9 commits December 23, 2019 19:57

Expression transform

37a049d

Add API and unit tests

be29e10

Change ExprType to ExprTypeKind

2c9ea50

Fix tests for Linux

04d2b93

Fix unit tests

e8f3491

change file header

1b3ab93

Fix test failure on Linux

a61e689

Sample and documentation

7c61a37

change sample

7db4fe9

yaeldekel force-pushed the expr branch from 25587cf to 7db4fe9 Compare December 23, 2019 18:00

justinormont reviewed Dec 23, 2019

View reviewed changes

codemzs reviewed Dec 26, 2019

View reviewed changes

codemzs approved these changes Dec 26, 2019

View reviewed changes

codemzs merged commit 6ae3a3f into dotnet:master Dec 26, 2019

justinormont mentioned this pull request Dec 30, 2019

Upsampling with IDataView #4028

Open

This was referenced Jan 1, 2020

Additional changes to ExpressionTransformer #4614

Merged

Support saving as ONNX for ExpressionTransformer #4615

Open

ghost locked as resolved and limited conversation to collaborators Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expression estimator/transformer #4548

Expression estimator/transformer #4548

yaeldekel commented Dec 8, 2019

codecov bot commented Dec 8, 2019 •

edited

Loading

justinormont Dec 23, 2019 •

edited by yaeldekel

Loading

justinormont Dec 23, 2019 •

edited by yaeldekel

Loading

justinormont Dec 23, 2019 •

edited by yaeldekel

Loading

yaeldekel Dec 23, 2019

justinormont Dec 23, 2019 •

edited by yaeldekel

Loading

justinormont Dec 23, 2019 •

edited

Loading

justinormont Dec 23, 2019

justinormont Dec 23, 2019

justinormont Dec 23, 2019

justinormont Jan 1, 2020

justinormont Dec 23, 2019

justinormont Dec 23, 2019

justinormont Aug 18, 2020

justinormont Dec 23, 2019

justinormont Jan 1, 2020

codemzs Dec 26, 2019

justinormont Jan 1, 2020

codemzs Dec 26, 2019

justinormont Jan 1, 2020

codemzs left a comment

	.Append(mlContext.Transforms.Expression("Expr2", "(b,s,i)=>b ? len(s) : i", "Boolean", "StringVector", "Int"))
	.Append(mlContext.Transforms.Expression("Expr2", "(b,s,i)=>b ? len(s) : i", "HasSiblings", "PreferredGreetings", "Age"))
	``` #Resolved

	throw Contracts.Except("Overflow");
	res = (neg ? Int64.MinValue : Int64.MaxValue);


		## Operators

		The operators of the expression language are listed in the following table, in precendence order. Unless otherwise noted,

Expression estimator/transformer #4548

Expression estimator/transformer #4548

Conversation

yaeldekel commented Dec 8, 2019

codecov bot commented Dec 8, 2019 • edited Loading

Codecov Report

justinormont Dec 23, 2019 • edited by yaeldekel Loading

Choose a reason for hiding this comment

justinormont Dec 23, 2019 • edited by yaeldekel Loading

Choose a reason for hiding this comment

justinormont Dec 23, 2019 • edited by yaeldekel Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinormont Dec 23, 2019 • edited by yaeldekel Loading

Choose a reason for hiding this comment

justinormont Dec 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codemzs left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 8, 2019 •

edited

Loading

justinormont Dec 23, 2019 •

edited by yaeldekel

Loading

justinormont Dec 23, 2019 •

edited by yaeldekel

Loading

justinormont Dec 23, 2019 •

edited by yaeldekel

Loading

justinormont Dec 23, 2019 •

edited by yaeldekel

Loading

justinormont Dec 23, 2019 •

edited

Loading