Closed
Description
Hello,
I generated a text file called openbookQA_train
. The contents of this file are shown below:
<sos> The sun is responsible for <mcoption> (A) puppies learning new tricks <eos>
<sos> The sun is responsible for <mcoption> (B) children growing up and getting old <eos>
<sos> The sun is responsible for <mcoption> (C) flowers wilting in a vase <eos>
<sos> The sun is responsible for <mcoption> (D) plants sprouting, blooming and wilting <eos>
I am trying to use or define torchtext Iterator to generate the input that I can pass into my Transformer.
I want each sample in my next(iter(openbookQA_train)).text
to be a series of integers that are obtained by tokenizing each line of words between <sos>
and <eos>
(including those special tokens), and for a sample that contains lesser number of tokens than the bptt length, I want the sample to include all of the tokenized words between <sos>
and <eos>
and the rest of the slots to be filled with the token <pad>
up to the bptt length.
How can I achieve this objective?
Thank you,
Metadata
Metadata
Assignees
Labels
No labels