PaLM-rlhf-pytorch is a PyTorch implementation of Pathways Language Model (PaLM) with Reinforcement Learning from Human Feedback (RLHF). It is designed for fine-tuning large-scale language models with human preference alignment, similar to OpenAI’s approach for training models like ChatGPT.
Features
- Implements RLHF for fine-tuning large-scale language models
- Uses PPO (Proximal Policy Optimization) for reinforcement learning stability
- Optimized for training on distributed hardware like GPUs and TPUs
- Supports both pretraining and reward model fine-tuning
- Built on PyTorch with modular and extensible components
- Designed for experimenting with human-aligned AI training
Categories
Reinforcement Learning FrameworksLicense
MIT LicenseFollow PaLM + RLHF - Pytorch
Other Useful Business Software
Gen AI apps are built with MongoDB Atlas
MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of PaLM + RLHF - Pytorch!