-
Notifications
You must be signed in to change notification settings - Fork 813
Multi30k mocked testing #1554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi30k mocked testing #1554
Conversation
with open(txt_file, "w") as f: | ||
for i in range(5): | ||
rand_string = " ".join( | ||
random.choice(string.ascii_letters) for i in range(seed) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought: since all of our datasets are utf-8 files, does it make sense to write unicode strings to make sure we don't have lingering bugs from default encodings when opening files? Maybe this is overkill, but it's been a big source of bugs when I did mostly windows development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, agreed. I think it is a good suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will keep it as a follow-up item as generating random UTF-8 string is not trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Left some nit comments. Also think @erip has a valid point about writing unicode strings when we mock data. Should we do this for all our tests moving forward?
test/datasets/test_multi30k.py
Outdated
rand_string = " ".join( | ||
random.choice(string.ascii_letters) for i in range(seed) | ||
) | ||
content = f'{rand_string}\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use double quotes here instead of single quotes
Added this as a follow up item to the GH issue! |
Reference issue #1493