-
westlake university
- Hangzhou
-
17:09
(UTC +08:00)
Stars
自动抓取并维护Cursor各平台(Windows、macOS、Linux)的历史版本下载链接,让用户可以根据需要安装或降级到特定版本。
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
A Framework of Small-scale Large Multimodal Models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Code release for the paper "The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains"
[EMNLP'25] A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
Jax Codebase for Evolutionary Strategies at the Hyperscale
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
Official repo for ICLR 2025: Your Weak LLM is Secretly a Strong Teacher for Alignment
Official implementation of paper "Learning to Optimize Multi-objective Alignment Through Dynamic Reward Weighting"
[ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"
Align Anything: Training All-modality Model with Feedback
[AAAI 2026 Oral] Implementation for "W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search".
"LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"
Public code repo for paper "Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization"
[ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model
[ACM MM 2025] MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alter-text Generation
The collection of awesome papers on alignment of diffusion models.
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
Train transformer language models with reinforcement learning.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
A RLHF Infrastructure for Vision-Language Models
[NeurIPS 2025] Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization