Skip to content

Commit 2dd5e25

Browse files
Nayef211eripnayef211
authored
Add unicode generation to IWSLT tests (followup to #1608) (#1642)
* meaningless change to make XML well formed (though it is not important). * Remove closing doc tag from xml list. Update how close tags are created Co-authored-by: Elijah Rippeth <[email protected]> Co-authored-by: nayef211 <[email protected]>
1 parent ec364a2 commit 2dd5e25

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

test/datasets/test_iwslt2016.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,15 @@ def _generate_uncleaned_train():
3939
"<title",
4040
"<speaker",
4141
"<doc",
42-
"</doc",
4342
]
4443
for i in range(100):
4544
rand_string = " ".join(random.choice(string.ascii_letters) for i in range(10))
4645
# With a 10% change, add one of the XML tags which is cleaned
4746
# to ensure cleaning happens appropriately
4847
if random.random() < 0.1:
4948
open_tag = random.choice(xml_tags) + ">"
50-
close_tag = "</" + open_tag[1:] + ">"
49+
# Open tag already contains the closing >
50+
close_tag = "</" + open_tag[1:]
5151
file_contents.append(open_tag + rand_string + close_tag)
5252
else:
5353
examples.append(rand_string + "\n")

test/datasets/test_iwslt2017.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,15 +36,15 @@ def _generate_uncleaned_train():
3636
"<title",
3737
"<speaker",
3838
"<doc",
39-
"</doc",
4039
]
4140
for i in range(100):
4241
rand_string = " ".join(random.choice(string.ascii_letters) for i in range(10))
4342
# With a 10% change, add one of the XML tags which is cleaned
4443
# to ensure cleaning happens appropriately
4544
if random.random() < 0.1:
4645
open_tag = random.choice(xml_tags) + ">"
47-
close_tag = "</" + open_tag[1:] + ">"
46+
# Open tag already contains the closing >
47+
close_tag = "</" + open_tag[1:]
4848
file_contents.append(open_tag + rand_string + close_tag)
4949
else:
5050
examples.append(rand_string + "\n")

0 commit comments

Comments
 (0)