-
Notifications
You must be signed in to change notification settings - Fork 812
TREC dataset #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TREC dataset #92
Conversation
Does the dataset contain corrupted UTF-8? If not I’d rather just load it with Python rather than adding another dependency? (If it does then this is probably fine, although a built-in error strategy might also work) |
torchtext/datasets/trec.py
Outdated
examples = [] | ||
|
||
def get_label_str(label): | ||
return label.split(':')[0] if not fine_grained else label |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchtext/datasets/trec.py
Outdated
text_field: The field that will be used for the sentence. | ||
label_field: The field that will be used for label data. | ||
root: The root directory that the dataset's zip archive will be | ||
expanded into; therefore the directory in whose trees |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torchtext/datasets/trec.py
Outdated
from six.moves import urllib | ||
|
||
|
||
class TREC(data.ZipDataset): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
yields
When
fine_grained=True
:Let me know if there is anything else you'd like to see. I'll leave the WIP until I hear
back.