re-parameterization of Multi30K dataset #1207

parmeet · 2021-02-23T05:25:40Z

Summary:

Refactoring Multi30K translation dataset in accordance with re-parameterization of IWSLT dataset #1191
Moved it to experimental folder
Added example usage
Added new doc page for experimental raw datasets
Updated doc string

…30krefactor

codecov · 2021-02-23T15:01:25Z

Codecov Report

Merging #1207 (d6e3a80) into master (27f9ed0) will increase coverage by 0.02%.
The diff coverage is 90.58%.

@@            Coverage Diff             @@
##           master    #1207      +/-   ##
==========================================
+ Coverage   73.21%   73.23%   +0.02%     
==========================================
  Files          67       67              
  Lines        3689     3718      +29     
==========================================
+ Hits         2701     2723      +22     
- Misses        988      995       +7

Impacted Files	Coverage Δ
torchtext/datasets/__init__.py	`100.00% <ø> (ø)`
torchtext/experimental/datasets/translation.py	`75.00% <80.00%> (-0.31%)`	⬇️
torchtext/experimental/datasets/raw/multi30k.py	`91.13% <91.13%> (ø)`
torchtext/experimental/datasets/raw/__init__.py	`100.00% <100.00%> (ø)`
torchtext/data/datasets_utils.py	`91.05% <0.00%> (-0.82%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27f9ed0...d6e3a80. Read the comment docs.

cpuhrsch · 2021-02-23T23:34:56Z

torchtext/experimental/datasets/raw/multi30k.py

+MD5 = {'train': [], 'valid': [], 'test': []}
+NUM_LINES = {'train': [], 'valid': [], 'test': []}
+
+for task in SUPPORTED_DATASETS.keys():


nit: the "in" operator should return a key by default, so I don't think you need .keys() here

parmeet added 2 commits February 23, 2021 00:19

refactoring multi30k

a0fc989

added TODO

de42170

facebook-github-bot added the cla signed label Feb 23, 2021

parmeet added 3 commits February 23, 2021 09:54

refactoring multi30k

2975871

added TODO

99a4b56

Merge branch 'multi30krefactor' of github.com:parmeet/text into multi…

b8cdd47

…30krefactor

parmeet added 6 commits February 23, 2021 12:31

fixed doc and added num lines

a2e99fd

fixed linter issue

9ee447a

fixing raw dataset unit tests

5195592

Merge branch 'master' of github.com:pytorch/text into multi30krefactor

faab6cb

fixing unicode issues in raw_datasets

45040b0

moving multi30k to experimentatal folder

d6e3a80

parmeet changed the title ~~[WIP] re-parameterization of Multi30K dataset~~ re-parameterization of Multi30K dataset Feb 23, 2021

cpuhrsch reviewed Feb 23, 2021

View reviewed changes

cpuhrsch approved these changes Feb 23, 2021

View reviewed changes

parmeet merged commit db8da95 into pytorch:master Feb 23, 2021

zhangguanheng66 mentioned this pull request Feb 24, 2021

[DO NOT LAND] Switch to translation dataset in torchtext 0.9.0 release pytorch/tutorials#1351

Closed

parmeet deleted the multi30krefactor branch February 24, 2021 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

re-parameterization of Multi30K dataset #1207

re-parameterization of Multi30K dataset #1207

Uh oh!

parmeet commented Feb 23, 2021 •

edited

Loading

Uh oh!

codecov bot commented Feb 23, 2021 •

edited

Loading

Uh oh!

cpuhrsch Feb 23, 2021

Uh oh!

Uh oh!

re-parameterization of Multi30K dataset #1207

re-parameterization of Multi30K dataset #1207

Uh oh!

Conversation

parmeet commented Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cpuhrsch Feb 23, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

parmeet commented Feb 23, 2021 •

edited

Loading

codecov bot commented Feb 23, 2021 •

edited

Loading