Skip to content

FeaturizeText should allow only outputColumnName to be defined #3992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
artidoro opened this issue Jul 11, 2019 · 1 comment · Fixed by #4211
Closed

FeaturizeText should allow only outputColumnName to be defined #3992

artidoro opened this issue Jul 11, 2019 · 1 comment · Fixed by #4211
Assignees
Labels
good first issue Good for newcomers P2 Priority of the issue for triage purpose: Needs to be fixed at some point. up-for-grabs A good issue to fix if you are trying to contribute to the project

Comments

@artidoro
Copy link
Contributor

artidoro commented Jul 11, 2019

One of the extension methods for FeaturizeText needs both the outputColumnName and the inputColumnNames to be provided.

There is no compile error if inputColumnNames is not provided, only a runtime error.

We should fix the error so that when inputColumnNames is not provided, it is set to new[] { outputColumnName } as we do everywhere else in the code base.

public static TextFeaturizingEstimator FeaturizeText(this TransformsCatalog.TextTransforms catalog,
string outputColumnName,
TextFeaturizingEstimator.Options options,
params string[] inputColumnNames)
=> new TextFeaturizingEstimator(Contracts.CheckRef(catalog, nameof(catalog)).GetEnvironment(),
outputColumnName, inputColumnNames, options);

@SnakyBeaky
Copy link
Contributor

The only similar case I've seen, is when inputColumnName is a single string and not a string[]. In those cases, the null-coalescing operator is used.

For example, in Microsoft.ML.Transforms/Text/TokenizingByCharacters.cs#L595 :

new[] { (outputColumnName, inputColumnName ?? outputColumnName)

But I have not seen this done when the input column names is an array and not a single string.

@harishsk harishsk added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Aug 19, 2019
@codemzs codemzs assigned antoniovs1029 and unassigned wangyems Sep 9, 2019
antoniovs1029 added a commit that referenced this issue Sep 19, 2019
* Fixed issue #3992 with TextFeaturizer when no inputColumnName is provided, and when 'null' is passed explicitly as inputColumnNames.

* Added Tests.

* Fixed a minor mistake in documentation.
KsenijaS pushed a commit to KsenijaS/machinelearning that referenced this issue Sep 27, 2019
dotnet#4211)

* Fixed issue dotnet#3992 with TextFeaturizer when no inputColumnName is provided, and when 'null' is passed explicitly as inputColumnNames.

* Added Tests.

* Fixed a minor mistake in documentation.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers P2 Priority of the issue for triage purpose: Needs to be fixed at some point. up-for-grabs A good issue to fix if you are trying to contribute to the project
Projects
None yet
5 participants