Skip to content

[WIP] add unicode generation to IWSLT tests #1608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

erip
Copy link
Contributor

@erip erip commented Feb 14, 2022

Addresses #1607

@parmeet
Copy link
Contributor

parmeet commented Feb 14, 2022

@parmeet
Copy link
Contributor

parmeet commented Feb 15, 2022

Does this sounds familiar https://app.circleci.com/pipelines/github/pytorch/text/4926/workflows/b22fa7de-be18-4fde-af6e-73d86731fb80/jobs/164125 ?

@erip This issue seems random. It didn't happen in our latest unit-testing on the release branch #1613. I am not sure why it is the case. Do you think it has anything to do with XML not well formed as you propose the fix in this PR?

@erip
Copy link
Contributor Author

erip commented Feb 15, 2022

Sorry -- I must've missed this notification. It's unclear, but it seems like it would be a consistent error since the closing tag always ends like ">>"... Hmm

@Nayef211
Copy link
Contributor

Hey @erip. I just wanted to touch base on this PR again to see what the blockers and next steps might be? I believe this is the only thing left before we can close out #1493

@erip
Copy link
Contributor Author

erip commented Feb 25, 2022

Sorry -- the semester has ramped up a bit for me in the last couple of weeks. 😄 I suspect (but haven't actually tested) that adding lxml support for reading would address this problem (by way of XMLParser's recover kwarg . I'm a bit hesitant about that approach because it adds a runtime dependency to address an issue that tests would introduce since ... we don't want users to silently recover from their own broken XML.

@Nayef211
Copy link
Contributor

Nayef211 commented Mar 3, 2022

Sorry -- the semester has ramped up a bit for me in the last couple of weeks. 😄 I suspect (but haven't actually tested) that adding lxml support for reading would address this problem (by way of XMLParser's recover kwarg . I'm a bit hesitant about that approach because it adds a runtime dependency to address an issue that tests would introduce since ... we don't want users to silently recover from their own broken XML.

Gotcha thanks for the context. I can take over investigating how to fix this test. I was also wondering why we have "</doc" in the xml_tags list even though its a closing tag (https://github.com/pytorch/text/blob/main/test/datasets/test_iwslt2016.py#L42)? Wouldn't this lead to an incorrectly formed XML file?

@Nayef211
Copy link
Contributor

Nayef211 commented Mar 9, 2022

Gotcha thanks for the context. I can take over investigating how to fix this test. I was also wondering why we have "</doc" in the xml_tags list even though its a closing tag (https://github.com/pytorch/text/blob/main/test/datasets/test_iwslt2016.py#L42)? Wouldn't this lead to an incorrectly formed XML file?

I added a new PR (#1642) that removes the closing tag from the xml tags list. Lmk what you guys think @erip @parmeet

Nayef211 added a commit that referenced this pull request Mar 9, 2022
* meaningless change to make XML well formed (though it is not important).

* Remove closing doc tag from xml list. Update how close tags are created

Co-authored-by: Elijah Rippeth <[email protected]>
Co-authored-by: nayef211 <[email protected]>
@Nayef211
Copy link
Contributor

Nayef211 commented Mar 9, 2022

Closing this now that #1642 has been merged

@Nayef211 Nayef211 closed this Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants