ShawnLee0910

Yuanshuai Li ShawnLee0910

A student from Westlake University with a passion for exploring the unknown.

2 followers · 14 following

westlake university
Hangzhou
17:09 (UTC +08:00)

Stars

flyeric0212 / cursor-history-links

自动抓取并维护Cursor各平台(Windows、macOS、Linux)的历史版本下载链接，让用户可以根据需要安装或降级到特定版本。

Python 195 14 Updated Dec 25, 2025

junkangwu / beta-DPO

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Python 50 5 Updated Oct 23, 2024

sail-sg / dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Python 46 3 Updated Apr 15, 2025

OpenBMB / MiniCPM

MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks

Jupyter Notebook 8,479 527 Updated Oct 8, 2025

TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

Python 940 95 Updated Apr 26, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,468 7,820 Updated Dec 24, 2025

scottgeng00 / delta_learning

Code release for the paper "The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong Gains"

9 Updated Jul 8, 2025

taco-group / Re-Align

[EMNLP'25] A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

Python 50 3 Updated Aug 21, 2025

zli12321 / Vision-SR1

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

Python 154 17 Updated Sep 23, 2025

ESHyperscale / HyperscaleES

Jax Codebase for Evolutionary Strategies at the Hyperscale

Python 203 16 Updated Dec 25, 2025

PKU-Alignment / aligner

[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct

Python 191 10 Updated Jan 16, 2025

deeplearning-wisc / weak_llm_teacher

Official repo for ICLR 2025: Your Weak LLM is Secretly a Strong Teacher for Alignment

Python 7 Updated Feb 11, 2025

yining610 / dynamic-reward-weighting

Official implementation of paper "Learning to Optimize Multi-objective Alignment Through Dynamic Reward Weighting"

Jupyter Notebook 18 Updated Sep 29, 2025

OpenBMB / CPO

Python 26 4 Updated Jul 16, 2024

injadlu / DAMA

[ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"

Python 15 1 Updated May 24, 2025

PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

Python 4,609 507 Updated Nov 27, 2025

W0215-git / GPML

Python 6 Updated Oct 13, 2025

alexzdy / W2S-AlignTree

[AAAI 2026 Oral] Implementation for "W2S-AlignTree: Weak-to-Strong Inference-Time Alignment for Large Language Models via Monte Carlo Tree Search".

Python 21 Updated Nov 17, 2025

HKUDS / LightReasoner

"LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?"

Python 578 27 Updated Nov 1, 2025

ShujinWu-0814 / Alice

Public code repo for paper "Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization"

Python 10 Updated May 28, 2025

zwhong714 / weak-to-strong-preference-optimization

[ICLR 2025 Spotlight] Weak-to-strong preference optimization: stealing reward from weak aligned model

Python 16 Updated Feb 24, 2025

ggjy / vision_weak_to_strong

Python 38 2 Updated Feb 8, 2024

LVUGAI / MCM-DPO

[ACM MM 2025] MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alter-text Generation

Python 1 Updated Oct 2, 2025

xie-lab-ml / awesome-alignment-of-diffusion-models

The collection of awesome papers on alignment of diffusion models.

382 16 Updated Oct 27, 2025

AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Python 356 13 Updated Jan 14, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 16,780 2,375 Updated Dec 24, 2025

vlf-silkie / VLFeedback

Python 100 2 Updated Dec 22, 2023

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,843 1,086 Updated Dec 25, 2025

TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models

Python 189 8 Updated Nov 15, 2024

Liuwq-bit / SymMPO

[NeurIPS 2025] Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

Python 3 1 Updated Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly