Skip to content

grpo + gen_rm 流程中的GenRMPlugin是否重复跑了数据 #4846

Closed
@vbhome6666

Description

@vbhome6666

按照现在Trainer的逻辑,gen_rm输入数据(即采样数据)是所有进程gather之后的,维度是(gradient_step_accumulation * num_processes * bs_per_device, ),而调用gen_rm貌似也没有做进程指数判断(比如if accelerator.is_main_process:),这是不是会导致每个进程上都会把所有的采样数据重复处理,导致gen_rm的效率非常慢

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions