We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用ernie-1.0中的文档训练ernie-3.0-tiny-micro-v2-zh报错 Traceback (most recent call last): File "run_pretrain.py", line 762, in do_train(config) File "run_pretrain.py", line 459, in do_train train_data_loader, valid_data_loader, test_data_loader = create_pretrained_dataset( File "run_pretrain.py", line 73, in create_pretrained_dataset train_ds, valid_ds, test_ds = build_train_valid_test_datasets( File "/opt/llm_pretrain/data_tools/dataset_utils.py", line 621, in build_train_valid_test_datasets output = get_datasets_weights_and_num_samples(data_prefix, train_valid_test_num_samples) File "/opt/llm_pretrain/data_tools/dataset_utils.py", line 140, in get_datasets_weights_and_num_samples assert weight_sum > 0.0 AssertionError
制作数据脚本为 python create_pretraining_data.py --model_name ernie-3.0-tiny-micro-v2-zh --tokenizer_name ErnieTokenizer --input_path ./data/llm_data.jsonl --split_sentences --chinese --cn_whole_word_segment --cn_seg_func jieba --output_prefix llm_data --workers 32 --log_interval 10000
文档中数据输出格式是npy和npz,而我这里是bin和idx,是不是数据处理有问题
The text was updated successfully, but these errors were encountered:
请问您的paddle和paddlenlp的版本是多少?
Sorry, something went wrong.
wawltor
No branches or pull requests
请提出你的问题
使用ernie-1.0中的文档训练ernie-3.0-tiny-micro-v2-zh报错
Traceback (most recent call last):
File "run_pretrain.py", line 762, in
do_train(config)
File "run_pretrain.py", line 459, in do_train
train_data_loader, valid_data_loader, test_data_loader = create_pretrained_dataset(
File "run_pretrain.py", line 73, in create_pretrained_dataset
train_ds, valid_ds, test_ds = build_train_valid_test_datasets(
File "/opt/llm_pretrain/data_tools/dataset_utils.py", line 621, in build_train_valid_test_datasets
output = get_datasets_weights_and_num_samples(data_prefix, train_valid_test_num_samples)
File "/opt/llm_pretrain/data_tools/dataset_utils.py", line 140, in get_datasets_weights_and_num_samples
assert weight_sum > 0.0
AssertionError
制作数据脚本为
python create_pretraining_data.py
--model_name ernie-3.0-tiny-micro-v2-zh
--tokenizer_name ErnieTokenizer
--input_path ./data/llm_data.jsonl
--split_sentences
--chinese
--cn_whole_word_segment
--cn_seg_func jieba
--output_prefix llm_data
--workers 32
--log_interval 10000
文档中数据输出格式是npy和npz,而我这里是bin和idx,是不是数据处理有问题
The text was updated successfully, but these errors were encountered: