You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
训练脚本如下:
CUDA_VISIBLE_DEVICES=0,1,2,3,4
NPROC_PER_NODE=4
swift rlhf
--rlhf_type grpo
--model /models/qwen/qwen_v2_5_7b_chat
--model_type qwen2_5
--external_plugins /rlhf/plugin/plugin.py
--reward_funcs external_countdown format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.6
--vllm_max_model_len 1024
--train_type full
--torch_dtype bfloat16
--dataset '/rlhf/input/data.jsonl'
--max_length 2048
--max_completion_length 1024
--num_train_epochs 3
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 5e-7
--gradient_accumulation_steps 4
--eval_steps 100
--save_steps 100
--save_total_limit 5
--logging_steps 1
--output_dir output
--warmup_ratio 0.01
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 4
--temperature 1.0
--system 'You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.'
--log_completions true
--deepspeed zero3
--num_iterations 1
--beta 0.001
数据集如下:
The text was updated successfully, but these errors were encountered:
训练脚本如下:
CUDA_VISIBLE_DEVICES=0,1,2,3,4
NPROC_PER_NODE=4
swift rlhf
--rlhf_type grpo
--model /models/qwen/qwen_v2_5_7b_chat
--model_type qwen2_5
--external_plugins /rlhf/plugin/plugin.py
--reward_funcs external_countdown format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.6
--vllm_max_model_len 1024
--train_type full
--torch_dtype bfloat16
--dataset '/rlhf/input/data.jsonl'
--max_length 2048
--max_completion_length 1024
--num_train_epochs 3
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 5e-7
--gradient_accumulation_steps 4
--eval_steps 100
--save_steps 100
--save_total_limit 5
--logging_steps 1
--output_dir output
--warmup_ratio 0.01
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 4
--temperature 1.0
--system 'You are a helpful assistant. You first thinks about the reasoning process in the mind and then provides the user with the answer.'
--log_completions true
--deepspeed zero3
--num_iterations 1
--beta 0.001
数据集如下:
The text was updated successfully, but these errors were encountered: