Skip to content

grpo训练32b模型OOM #3871

Closed
Closed
@zhilinwang1

Description

@zhilinwang1

用的tensor parallel 8, offload optimizer,flash attention, vllm,在8*96G 的机器上OOM 下面是具体的配置和报错:
nproc_per_node=8

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NNODES=$nnodes
NODE_RANK=$RANK
MASTER_ADDR=$MASTER_ADDR
MASTER_PORT=$MASTER_PORT
NPROC_PER_NODE=$nproc_per_node
swift rlhf
--rlhf_type grpo
--model xxxx/xxxxx
--model-type qwq
--attn_impl flash_attn
--gradient_checkpointing true
--reward_funcs reflection_q
--use_vllm false
--vllm_device auto
--vllm_gpu_memory_utilization 0.8
--vllm_max_model_len 8192
--num_infer_workers 8
--train_type full
--torch_dtype bfloat16
--dataset 'xxxxx.jsonl'
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-6
--gradient_accumulation_steps 8
--eval_steps 200
--save_steps 200
--save_total_limit 2
--logging_steps 5
--max_length 2048
--output_dir output
--warmup_ratio 0.05
--dataloader_num_workers 8
--dataset_num_proc 8
--num_generations 8
--temperature 0.9
--deepspeed zero3
--log_completions true
--sleep_level 1
--offload_model true
--offload_optimizer true
--gc_collect_after_offload true
--log_completions true
--tensor_parallel_size 8 \

报错信息:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions