Skip to content

[DO NOT LAND] Switch to translation dataset in torchtext 0.9.0 release #1351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

zhangguanheng66
Copy link
Contributor

In torchtext 0.9.0 release, we will include the raw text datasets as beta release. This PR is to switch the data to the raw dataset.

This PR should be tested against pytorch 1.8.0 rc and torchtext 0.9.0 rc.

@zhangguanheng66
Copy link
Contributor Author

zhangguanheng66 commented Feb 9, 2021

We expect the CI tests to fail until we switch to 1.8.0-rc. The tutorial works with the nightly release of torchtext, which will be cut for the rc package.
cc @brianjo @cpuhrsch

@netlify
Copy link

netlify bot commented Feb 9, 2021

Deploy preview for pytorch-tutorials-preview ready!

Built with commit e6db9a4

https://deploy-preview-1351--pytorch-tutorials-preview.netlify.app

@netlify
Copy link

netlify bot commented Feb 9, 2021

Deploy preview for pytorch-tutorials-preview ready!

Built with commit 4910b26

https://deploy-preview-1351--pytorch-tutorials-preview.netlify.app

Base automatically changed from master to main February 16, 2021 19:33
Base automatically changed from main to master February 16, 2021 19:37
Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Multi30k is being moved into experimental for this release, let's pick a different translation dataset, for example one of the IWSLT datasets.

@zhangguanheng66
Copy link
Contributor Author

zhangguanheng66 commented Feb 24, 2021

Switch to IWSLT2017

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tutorial itself still has code such as

# ENC_EMB_DIM = 256
# DEC_EMB_DIM = 256
# ENC_HID_DIM = 512
# DEC_HID_DIM = 512
# ATTN_DIM = 64
# ENC_DROPOUT = 0.5
# DEC_DROPOUT = 0.5

Did you also confirm that the model converges to something meaningful under this new dataset?

@zhangguanheng66
Copy link
Contributor Author

The tutorial itself still has code such as

# ENC_EMB_DIM = 256
# DEC_EMB_DIM = 256
# ENC_HID_DIM = 512
# DEC_HID_DIM = 512
# ATTN_DIM = 64
# ENC_DROPOUT = 0.5
# DEC_DROPOUT = 0.5

Did you also confirm that the model converges to something meaningful under this new dataset?

I'm adjusting the amount of data used in this tutorial as IWSLT2017 has more data than Multi30k

@zhangguanheng66 zhangguanheng66 force-pushed the translation branch 2 times, most recently from 2582bfa to 1276a3a Compare February 24, 2021 16:46
Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the issues with convergence on this IWSLT2017 and the release of Multi30k being delayed, let's not release an update to this tutorial.

@zhangguanheng66 zhangguanheng66 changed the title [1.8 release] Switch to translation dataset in torchtext 0.9.0 release [DO NOT LAND] Switch to translation dataset in torchtext 0.9.0 release Feb 24, 2021
@zhangguanheng66
Copy link
Contributor Author

Since Multi30k dataset is not going to be released in torchtext v0.9.0 - pytorch/text#1207, we will delay this PR until we release Multi30k.

Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the stale Stale PRs label Sep 26, 2024
@github-actions github-actions bot closed this Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants