Skip to content

Fixes #3992 and corner cases of inputColumnNames on FeaturizeText #4211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 19, 2019

Conversation

antoniovs1029
Copy link
Member

@antoniovs1029 antoniovs1029 commented Sep 13, 2019

Fixes #3992 , where FeaturizeText is used with an options object, but no inputColumnNames is provided, thus expecting the inputColumnName to be defaulted to the outputColumnName.

It also covers the corner cases where the user uses FeaturizeText with an options object along with a 'null' or an empty string as inputColumnNames.

Three different tests are provided to cover those 3 cases. It is verified that the text is featurized correctly and the features are saved in a column with the same name as the input column. They are all based on the TextFeaturizerWithWordFeatureExtractorTest, but without using a PredictionEngine because the output column hides the input column, and thus instead getting the rows of the dataview to verify their values.

A small fix to the documentation of FeaturizeText is added.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner September 13, 2019 00:04
@dnfclas
Copy link

dnfclas commented Sep 13, 2019

CLA assistant check
All CLA requirements met.

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yaeldekel yaeldekel mentioned this pull request Sep 19, 2019
4 tasks
@antoniovs1029 antoniovs1029 merged commit edfd10f into dotnet:master Sep 19, 2019
KsenijaS pushed a commit to KsenijaS/machinelearning that referenced this pull request Sep 27, 2019
dotnet#4211)

* Fixed issue dotnet#3992 with TextFeaturizer when no inputColumnName is provided, and when 'null' is passed explicitly as inputColumnNames.

* Added Tests.

* Fixed a minor mistake in documentation.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FeaturizeText should allow only outputColumnName to be defined
3 participants