We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rank2]: batch_encoded_inputs = self._prepare_batch_inputs(inputs, total_rewards) [rank2]: File "/opt/conda/lib/python3.10/site-packages/swift/trainers/rlhf_trainer/grpo_trainer.py", line 1026, in _prepare_batch_inputs [rank2]: assert len(inputs) == bs * gas, f'Expected {bs * gas} inputs, got {len(inputs)}' [rank2]: AssertionError: Expected 32 inputs, got 4 使用grpo训练模型是报了这个错误, 对应的训练脚本是这个 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NPROC_PER_NODE=6 swift rlhf --rlhf_type grpo --model ${model_path} --external_plugins ${external_plugins_name} --reward_funcs custom_acc custom_format --use_vllm true --vllm_device auto --vllm_gpu_memory_utilization 0.9 --vllm_max_model_len 5192 --num_infer_workers 2 --num_generations 24 --train_type lora --lora_rank 64 --lora_alpha 256 --torch_dtype bfloat16 --dataset ${data_path} --max_completion_length 3072 --num_train_epochs 1 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --learning_rate 1e-6 --gradient_accumulation_steps 8 # --eval_steps 200 --save_steps 1 --save_total_limit 20 --logging_steps 1 --max_length 5192 --output_dir ${model_output} --warmup_ratio 0.01 --dataloader_num_workers 4 --dataset_num_proc 4 --temperature 0.7 --top_p 0.95 --top_k 20 --deepspeed zero3 --log_completions true 这里我有几个问题,我参考了官方脚本GRPO.md,没有显式的制定eval_datasets,这里是应该显式指定吗,如果不显式指定,默认是怎么分的呢?
The text was updated successfully, but these errors were encountered:
@Jintao-Huang 麻烦大佬帮忙看下
Sorry, something went wrong.
The shape issue has been resolved in the main branch.
By default, the eval_dataset takes 0.01 of the train_dataset (as determined by the split_dataset_ratio parameter).
split_dataset_ratio
No branches or pull requests
[rank2]: batch_encoded_inputs = self._prepare_batch_inputs(inputs, total_rewards)
[rank2]: File "/opt/conda/lib/python3.10/site-packages/swift/trainers/rlhf_trainer/grpo_trainer.py", line 1026, in _prepare_batch_inputs
[rank2]: assert len(inputs) == bs * gas, f'Expected {bs * gas} inputs, got {len(inputs)}'
[rank2]: AssertionError: Expected 32 inputs, got 4 使用grpo训练模型是报了这个错误,
对应的训练脚本是这个
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=6
swift rlhf
--rlhf_type grpo
--model ${model_path}
--external_plugins ${external_plugins_name}
--reward_funcs custom_acc custom_format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.9
--vllm_max_model_len 5192
--num_infer_workers 2
--num_generations 24
--train_type lora
--lora_rank 64
--lora_alpha 256
--torch_dtype bfloat16
--dataset ${data_path}
--max_completion_length 3072
--num_train_epochs 1
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 1e-6
--gradient_accumulation_steps 8
# --eval_steps 200
--save_steps 1
--save_total_limit 20
--logging_steps 1
--max_length 5192
--output_dir ${model_output}
--warmup_ratio 0.01
--dataloader_num_workers 4
--dataset_num_proc 4
--temperature 0.7
--top_p 0.95
--top_k 20
--deepspeed zero3
--log_completions true
这里我有几个问题,我参考了官方脚本GRPO.md,没有显式的制定eval_datasets,这里是应该显式指定吗,如果不显式指定,默认是怎么分的呢?
The text was updated successfully, but these errors were encountered: