We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
文本被加密,所以得到的就是一串数字14 108 28 30 15 13 294 29 20 18 23 21 25 大概总共单词1300左右,我自己生成vacab.txt,加上[PAD] [CLS] [SEP] [MASK] [UNK] 通常tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0') 如果我想tokenizer基于我自己的vocab.txt,要如何操作?
The text was updated successfully, but these errors were encountered:
如果tokenizer的词表是自己构造的话,则需要重新预训练,一般不建议。
Sorry, something went wrong.
wj-Mcat
No branches or pull requests
请提出你的问题
文本被加密,所以得到的就是一串数字14 108 28 30 15 13 294 29 20 18 23 21 25
大概总共单词1300左右,我自己生成vacab.txt,加上[PAD] [CLS] [SEP] [MASK] [UNK]
通常tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
如果我想tokenizer基于我自己的vocab.txt,要如何操作?
The text was updated successfully, but these errors were encountered: