Skip to content

CountFeatureSelectingEstimator no selection support #5000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 10, 2020

Conversation

Lynx1820
Copy link
Contributor

@Lynx1820 Lynx1820 commented Apr 4, 2020

Issue: The GatherElements ONNX operator used by SlotsDroppingTransformer (transformer for CountFeatureSelectingEstimator) has issues when there's nothing selected, so mlnet needs to handle this case separately.

This PR

  • Fixes the issue mentioned above
  • Modifies the estimator's test to be more robust

Other issues:

  • There's no support for string initializing in ONNX, which prevents us from handling the case where no string columns get selected.
  • The documentation for CountFeatureSelectingEstimator is ambiguous. It states that it supports input as "vector or scalar of numeric, text or key data types". However, only strings, doubles, and singles are supported.

@Lynx1820 Lynx1820 requested a review from a team as a code owner April 4, 2020 00:04
@@ -896,27 +896,25 @@ public void SaveAsOnnx(OnnxContext ctx)
public bool SaveAsOnnxCore(OnnxContext ctx, int iinfo, string srcVariableName, string dstVariableName)
{
string opType;
if (_srcTypes[iinfo] is VectorDataViewType)
var slots = _slotDropper[iinfo].GetPreservedSlots();
if (_srcTypes[iinfo] is VectorDataViewType && slots.Count() > 0)
Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_srcTypes[iinfo] is VectorDataViewType && slots.Count() > 0 [](start = 20, length = 59)

what if _srcTypes[iinfo] is VectorDataViewType && slots.Count() == 0? should we return true or false?
in current implementation seems it will goes into the "else" condition in line 917 and which should handling double numeric data type #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If slots.Count == 0, then the column get's "suppressed". It's not handled well by the GatherElements operator, so we create an empty vector in the else clause. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but we better to make the logic clear like below, don't let different cases fall into same condition which will make code harder to read and will cause potential issue in future:

if (_srcTypes[iinfo] is VectorDataViewType)
{
if (count > 0) { ... }
else { ... }
}
else { ... }


In reply to: 404404988 [](ancestors = 404404988)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Lynx1820 Lynx1820 merged commit 8660ecc into dotnet:master Apr 10, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants