-
Notifications
You must be signed in to change notification settings - Fork 812
fixing and re-organizing pipelines #1250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1250 +/- ##
=======================================
Coverage 78.80% 78.80%
=======================================
Files 67 67
Lines 3624 3624
=======================================
Hits 2856 2856
Misses 768 768 Continue to review full report at Codecov.
|
examples/data_pipeline/README.md
Outdated
python pipelines.py --pipeline pytext | ||
|
||
|
||
## Experimental PyText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want to remove this pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The github repo of pytext is not maintained anymore and is not in good state as of this writing. This would mean that we have code in torchtext that is breaking going forward. I wasn't so sure, if we still want to maintain these code snippets that may sporadically break?
cc: @hudeven
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this in a separate PR, since we're not entirely clear on it and the other changes in this diff can go ahead, plus we might need it for comparison relatively soon.
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(description='Data procesing pipelines') | ||
parser.add_argument('--pipeline', type=str, default='sentencepiece', | ||
help='The name of pipeline') | ||
parser.add_argument('--dataset', type=str, default='AG_NEWS', | ||
help='Dataset for performance benchmark') | ||
parser.add_argument('--spm-filename', type=str, default='m_user.model', | ||
parser.add_argument('--spm-filename', type=str, default='text_unigram_25000', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for ease of use in default mode. If the user does not have access to spm model, user can simply run the code in default mode that will download one of the pre-trained spm model (text_unigram_25000). This change does not impact previous behavior, i.e if the use indeed specify m_user.model (with name other that the name in pre-trained spm models) it will work with the user model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment before landing
FIxing pipelines according to new features in torchtext and removing pytext dependency