[Bug]: convert_ids_to_tokens() got an unexpected keyword argument 'skip_special_tokens' #6106

xiehuanyi · 2023-06-06T01:09:26Z

软件环境

paddle-bfloat                  0.1.2
paddle2onnx                    1.0.6
paddlefsl                      1.1.0
paddlehub                      2.0.4
paddlenlp                      2.5.2
paddlepaddle-gpu               2.3.0rc0.post112
tb-paddle                      0.3.6

重复问题

I have searched the existing issues

错误描述

使用cpm-gpt的tokenizer遇到了两个问题。
第一个问题是没有pad_id

[2023-06-06 09:06:23,496] [   ERROR] - Using pad_token, but it is not set yet.

添加了pad_token之后出现了解码问题，如下：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_11555/2309472343.py in <module>
     32     print(res['input_ids'])
     33     print(len(res['input_ids']))
---> 34     print(dec_tokenizer.decode(res['input_ids']))
     35     print()
     36     print()

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/transformers/tokenizer_utils_base.py in decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
   3124             skip_special_tokens=skip_special_tokens,
   3125             clean_up_tokenization_spaces=clean_up_tokenization_spaces,
-> 3126             **kwargs,
   3127         )
   3128 

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/transformers/tokenizer_utils.py in _decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, spaces_between_special_tokens, **kwargs)
   1443         self._decode_use_source_tokenizer = kwargs.pop("use_source_tokenizer", False)
   1444 
-> 1445         filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
   1446 
   1447         # To avoid mixing byte-level and unicode for byte-level BPT

TypeError: convert_ids_to_tokens() got an unexpected keyword argument 'skip_special_tokens'

这似乎是接口不一致导致的

稳定复现步骤 & 代码

from paddlenlp.transformers import AutoTokenizer

enc_tokenizer = AutoTokenizer.from_pretrained('ernie-3.0-medium-zh')
dec_tokenizer = AutoTokenizer.from_pretrained('gpt-cpm-small-cn-distill')
# dec_tokenizer.add_special_tokens({'pad_token': dec_tokenizer.eos_token})

for i in range(5):
    j_dic = json.loads(data[i])
    feat = template(get_title_result(j_dic), get_asr_result(j_dic), get_ocr_result(j_dic), get_faces_result(j_dic))
    label = get_label(j_dic)
    print(label)
    # res = enc_tokenizer(
    #     text=feat, 
    #     max_length=512, 
    #     padding=True, 
    #     truncation=True, 
    #     return_token_type_ids=True, 
    #     return_attention_mask=True
    #     )
    # print(res['input_ids'])
    # print(len(res['input_ids']))
    # print(enc_tokenizer.decode(res['input_ids']))
    # print()
    res = dec_tokenizer(
        text=label, 
        max_length=256, 
        padding=True, 
        truncation=True, 
        return_token_type_ids=False, 
        return_attention_mask=True
    )
    print(res['input_ids'])
    print(len(res['input_ids']))
    print(dec_tokenizer.decode(res['input_ids']))
    print()
    print()

# help(tokenizer)

The text was updated successfully, but these errors were encountered:

w5688414 · 2024-05-08T08:12:10Z

您提供的代码不全，请提供一下最小复现的代码：

j_dic = json.loads(data[i])
    feat = template(get_title_result(j_dic), get_asr_result(j_dic), get_ocr_result(j_dic), get_faces_result(j_dic))
    label = get_label(j_dic)

xiehuanyi added the bug Something isn't working label Jun 6, 2023

github-actions bot added the triage label Jun 6, 2023

lugimzzz assigned gongel Jun 6, 2023

gongel assigned wj-Mcat Jun 12, 2023

paddle-bot bot closed this as completed May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: convert_ids_to_tokens() got an unexpected keyword argument 'skip_special_tokens' #6106

[Bug]: convert_ids_to_tokens() got an unexpected keyword argument 'skip_special_tokens' #6106

xiehuanyi commented Jun 6, 2023 •

edited

Loading

w5688414 commented May 8, 2024

[Bug]: convert_ids_to_tokens() got an unexpected keyword argument 'skip_special_tokens' #6106

[Bug]: convert_ids_to_tokens() got an unexpected keyword argument 'skip_special_tokens' #6106

Comments

xiehuanyi commented Jun 6, 2023 • edited Loading

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

w5688414 commented May 8, 2024

xiehuanyi commented Jun 6, 2023 •

edited

Loading