Skip to content

Conversation

antoniovs1029
Copy link
Contributor

@antoniovs1029 antoniovs1029 commented Sep 13, 2019

Fixes #3992 , where FeaturizeText is used with an options object, but no inputColumnNames is provided, thus expecting the inputColumnName to be defaulted to the outputColumnName.

It also covers the corner cases where the user uses FeaturizeText with an options object along with a 'null' or an empty string as inputColumnNames.

Three different tests are provided to cover those 3 cases. It is verified that the text is featurized correctly and the features are saved in a column with the same name as the input column. They are all based on the TextFeaturizerWithWordFeatureExtractorTest, but without using a PredictionEngine because the output column hides the input column, and thus instead getting the rows of the dataview to verify their values.

A small fix to the documentation of FeaturizeText is added.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner September 13, 2019 00:04
@dnfclas
Copy link

dnfclas commented Sep 13, 2019

CLA assistant check
All CLA requirements met.

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yaeldekel yaeldekel mentioned this pull request Sep 19, 2019
4 tasks
@antoniovs1029 antoniovs1029 merged commit edfd10f into dotnet:master Sep 19, 2019
KsenijaS pushed a commit to KsenijaS/machinelearning that referenced this pull request Sep 27, 2019
dotnet#4211)

* Fixed issue dotnet#3992 with TextFeaturizer when no inputColumnName is provided, and when 'null' is passed explicitly as inputColumnNames.

* Added Tests.

* Fixed a minor mistake in documentation.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FeaturizeText should allow only outputColumnName to be defined
3 participants