NVIDIA Research Scientist
Pinned Loading
-
RLHFlow/RLHF-Reward-Modeling
RLHFlow/RLHF-Reward-Modeling PublicRecipes to train reward model for RLHF.
-
NVlabs/NFT
NVlabs/NFT PublicImplementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasoning"
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.