-
Notifications
You must be signed in to change notification settings - Fork 4.2k
[DO NOT LAND] Switch to translation dataset in torchtext 0.9.0 release #1351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploy preview for pytorch-tutorials-preview ready! Built with commit e6db9a4 https://deploy-preview-1351--pytorch-tutorials-preview.netlify.app |
Deploy preview for pytorch-tutorials-preview ready! Built with commit 4910b26 https://deploy-preview-1351--pytorch-tutorials-preview.netlify.app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Multi30k is being moved into experimental for this release, let's pick a different translation dataset, for example one of the IWSLT datasets.
Switch to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tutorial itself still has code such as
# ENC_EMB_DIM = 256
# DEC_EMB_DIM = 256
# ENC_HID_DIM = 512
# DEC_HID_DIM = 512
# ATTN_DIM = 64
# ENC_DROPOUT = 0.5
# DEC_DROPOUT = 0.5
Did you also confirm that the model converges to something meaningful under this new dataset?
I'm adjusting the amount of data used in this tutorial as IWSLT2017 has more data than Multi30k |
2582bfa
to
1276a3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the issues with convergence on this IWSLT2017 and the release of Multi30k being delayed, let's not release an update to this tutorial.
77342e2
to
a916e41
Compare
Since |
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
In torchtext 0.9.0 release, we will include the raw text datasets as beta release. This PR is to switch the data to the raw dataset.
This PR should be tested against pytorch 1.8.0 rc and torchtext 0.9.0 rc.