-
Notifications
You must be signed in to change notification settings - Fork 115
Support for pairwise judges in online training #1194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
from fairseq2.recipes.lm._online_finetune._rewards import ( | ||
GenerativePairwiseVerifierHandler as GenerativePairwiseVerifierHandler, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are generic generative-judge reward classes, which internally will call the specific extractors (that users will have to define) as follows in the reward config:
reward:
name: "generative_pairwise_verifier"
config:
prompt_key: prompt_raw
tokenizer: /datasets/pretrained-llms/Llama-3.1-8B-Instruct
judgment_extractor: "j1_pairwise_score_extractor"
For scalar RMs, the "judgment_extractor" will be empty or ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacklanchantin FYI
What does this PR do? Please describe:
Adds support for online training with any pairwise LLM-as-a-Judge that generates real-valued scores for responses.
Fixes #{issue number}
Does your PR introduce any breaking changes? If yes, please list them:
List of all backwards-incompatible changes.
Check list: