Skip to content

re-parameterization of Multi30K dataset #1207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 23, 2021

Conversation

parmeet
Copy link
Contributor

@parmeet parmeet commented Feb 23, 2021

Summary:

  • Refactoring Multi30K translation dataset in accordance with re-parameterization of IWSLT dataset #1191
  • Moved it to experimental folder
  • Added example usage
  • Added new doc page for experimental raw datasets
  • Updated doc string

@codecov
Copy link

codecov bot commented Feb 23, 2021

Codecov Report

Merging #1207 (d6e3a80) into master (27f9ed0) will increase coverage by 0.02%.
The diff coverage is 90.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1207      +/-   ##
==========================================
+ Coverage   73.21%   73.23%   +0.02%     
==========================================
  Files          67       67              
  Lines        3689     3718      +29     
==========================================
+ Hits         2701     2723      +22     
- Misses        988      995       +7     
Impacted Files Coverage Δ
torchtext/datasets/__init__.py 100.00% <ø> (ø)
torchtext/experimental/datasets/translation.py 75.00% <80.00%> (-0.31%) ⬇️
torchtext/experimental/datasets/raw/multi30k.py 91.13% <91.13%> (ø)
torchtext/experimental/datasets/raw/__init__.py 100.00% <100.00%> (ø)
torchtext/data/datasets_utils.py 91.05% <0.00%> (-0.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27f9ed0...d6e3a80. Read the comment docs.

@parmeet parmeet changed the title [WIP] re-parameterization of Multi30K dataset re-parameterization of Multi30K dataset Feb 23, 2021
MD5 = {'train': [], 'valid': [], 'test': []}
NUM_LINES = {'train': [], 'valid': [], 'test': []}

for task in SUPPORTED_DATASETS.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the "in" operator should return a key by default, so I don't think you need .keys() here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants