Skip to content

Commit e09b00d

Browse files
authored
[grpo] fix off-policy check (#4852)
1 parent 46d2744 commit e09b00d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

swift/trainers/rlhf_trainer/grpo_trainer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1433,7 +1433,7 @@ def _process_infer_requests_images(self, infer_requests: InputsType):
14331433
return
14341434

14351435
def old_policy(self):
1436-
return self.num_iterations > 1 or self.args.steps_per_generation > self.args.gradient_accumulation_steps
1436+
return self.num_iterations > 1 or self.args.gradient_accumulation_steps % self.args.steps_per_generation != 0
14371437

14381438
@property
14391439
def _queue(self):

0 commit comments

Comments
 (0)