Welcome to the "Awesome Search" repository !
This repository, accompanying our paper, provides a comprehensive review and unified framework for tree search-based methods, demonstrating how these algorithms are revolutionizing LLM test-time reasoning with scalable and efficient problem-solving solutions.
Dive into this repository to explore how innovative search-based methods like MCTS are reshaping reasoning capabilities in LLMs!
🔔 🔔 🔔 For more detailed information, please refer to our paper
✉️ ➡️ 📪 If you have any questions, please feel free to contact us at:
{weijiaqi, yangyuejin}@pjlab.org.cn | [email protected]
As the scaling of large language models (LLMs) during training reaches diminishing returns, there has been a shift toward scalable test-time reasoning algorithms. Chain-of-Thought (CoT) reasoning has emerged as a promising approach, enabling intermediate reasoning steps in text space. However, traditional CoT methods suffer from single-path exploration, which limits their ability to fully explore complex reasoning spaces.
To address this limitation, recent works have adopted tree search-based reasoning frameworks, inspired by classical search algorithms such as Depth-First Search (DFS), Breadth-First Search (BFS), and Monte Carlo Tree Search (MCTS). These methods demonstrate significant potential in balancing exploration and exploitation, enabling LLMs to efficiently solve complex tasks at test time.
This repository provides a comprehensive framework for tree search-based reasoning in LLMs, aiming to unify and advance the field. Our primary contributions include:
-
A Unified Formalism:
We propose a structured mathematical framework to analyze and compare tree search algorithms, focusing on their core mechanisms, reasoning reward formulations, and application domains. Specifically, we formalize the role of "reward" as a transient guidance signal in test-time search. -
A Systematic Taxonomy:
We categorize existing search algorithms along three primary axes:- The search mechanism (e.g., DFS, BFS, MCTS)
- The reward formulation
- The application domain
This taxonomy provides clarity for researchers and practitioners navigating this evolving field.
-
A Synthesis of Applications and Future Directions:
We map the primary applications of tree search reasoning, including mathematical reasoning, data generation, and optimization. Additionally, we highlight key areas for future research, such as improving general-purpose reasoning capabilities and enhancing scalability.
Our survey highlights the transformative potential of tree search-based reasoning frameworks in overcoming the limitations of traditional CoT methods. By providing a unified formalism, systematic taxonomy, and practical insights, we aim to establish a robust foundation for advancing LLM test-time reasoning.
For more details, please refer to our full paper or explore the examples and implementations provided in this repository.
- Part 1: MCTS for Direct Inference-Time Enhancement
- Part 2: MCTS for Self-Improvement via Data Generation
- Part 3: Advanced Topics and Hybrid Approaches
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Muralidharan, Jananee and Thomas, Tiju
-
Reasoning with language model is planning with world model, Hao, Shibo et al.
-
AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training, Feng, Xidong et al.
-
Towards self-improvement of llms via mcts: Leveraging stepwise knowledge with curriculum preference learning, Wang, Xiyao et al.
-
PPL-MCTS: Constrained textual generation through discriminator-guided MCTS decoding, Chaffin, Antoine et al.
-
Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding, Liu, Jiacheng et al.
-
Args: Alignment as reward-guided search, Khanov, Maxim et al.
-
When is tree search useful for llm planning? it depends on the discriminator, Chen, Ziru et al.
-
Interpretable contrastive monte carlo tree search reasoning, Gao, Zitian et al.
-
Synthetic data generation from real data sources using monte carlo tree search and large language models, Locowic, Leonardo et al.
-
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search, Zhang et al.
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing, Tian et al.
-
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning, Xie et al.
-
Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine with LLaMa-3 8B, Zhang et al.
-
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function, Xu et al.
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers, Qi et al.
-
AlphaMath Almost Zero: Process Supervision Without Process, Chen et al.
-
LiteSearch: Efficacious Tree Search for LLM, Wang et al.
-
Markov Chain of Thought for Efficient Mathematical Reasoning, Yang et al.
-
OVM, Outcome-Supervised Value Models for Planning in Mathematical Reasoning, Yu et al.
-
MindStar: Enhancing Math Reasoning in Pre-Trained LLMs at Inference Time, Kang et al.
-
LLaMA-Berry: Pairwise Optimization for Olympiad-Level Mathematical Reasoning via O1-like Monte Carlo Tree Search, Zhang et al.
-
Beyond Examples: High-Level Automated Reasoning Paradigm in In-Context Learning via MCTS, Wu et al.
-
BoostStep: Boosting Mathematical Capability of Large Language Models via Improved Single-Step Reasoning, Zhang et al.
-
Step-Level Value Preference Optimization for Mathematical Reasoning, Chen et al.
-
Improve Mathematical Reasoning in Language Models by Automated Process Supervision, Luo et al.
-
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning, Ma et al.
-
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning, Park et al.
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking, Guan et al.
-
Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning, Lin et al.
-
Planning With Large Language Models for Code Generation, Zhang et al.
-
Make Every Move Count: LLM-Based High-Quality RTL Code Generation Using MCTS, DeLorenzo et al.
-
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation, Li et al.
-
VerMCTS: Synthesizing Multi-Step Programs Using a Verifier, a Large Language Model, and Tree Search, Brandfonbrener et al.
-
Generating Code World Models With Large Language Models Guided by Monte Carlo Tree Search, Dainese et al.
-
Planning In Natural Language Improves LLM Search For Code Generation, Wang et al.
-
O1-Coder: An O1 Replication for Coding, Zhang et al.
-
SRA-MCTS: Self-Driven Reasoning Augmentation With Monte Carlo Tree Search for Code Generation, Xu et al.
-
SWE-Search: Enhancing Software Agents With Monte Carlo Tree Search and Iterative Refinement, Antoniades et al.
-
PepTune: De Novo Generation of Therapeutic Peptides With Multi-Objective-Guided Discrete Diffusion, Tang et al.
-
BFS-Prover: Scalable Best-First Tree Search for LLM-Based Automatic Theorem Proving, Xin et al.
-
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation, Wang et al.
-
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution, Li et al.
-
Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving, Zimmer et al.
-
APRMCTS: Improving LLM-Based Automated Program Repair With Iterative Tree Search, Hu et al.
-
MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution, Wang et al.
-
Tree Search for Language Model Agents, Koh et al.
-
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models, Zhai et al.
-
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning, Zhao et al.
-
Can Large Language Models Play Games? A Case Study of A Self-Play Approach, Guo et al.
-
Planning with Large Language Models for Conversational Agents, Li et al.
-
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning, Yu et al.
-
REX: Rapid Exploration and eXploitation for AI Agents, Murthy et al.
-
SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning, Chi et al.
-
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search, Zhuang et al.
-
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents, Putta et al.
-
Information Directed Tree Search: Reasoning and Planning with Language Agents, Chandak et al.
-
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning, Meng et al.
-
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search, Light et al.
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models, Zhou et al.
-
Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration, Ye et al.
-
SAPIENT: Mastering Multi-Turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search, Du et al.
-
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration, Zhang et al.
-
A Training Data Recipe to Accelerate A* Search with Language Models, Gupta et al.
-
Planning Like Human: A Dual-Process Framework for Dialogue Planning, He et al.
-
Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning, Yu et al.
-
Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design, Zheng et al.
-
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training, Yuan et al.
-
MASTER: A Multi-Agent System with LLM Specialized MCTS, Gan et al.
-
Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search, Shi et al.
-
SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents, Lin et al.
-
WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis, Gao et al.
-
Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills, Xie et al.
-
AgentSwift: Efficient LLM Agent Design via Value-Guided Hierarchical Search, Li et al.
-
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems, Hou et al.
-
AgentXploit: End-to-End Redteaming of Black-Box AI Agents, Wang et al.
-
A Novel Approach to Optimize Large Language Models for Named Entity Matching with Monte Carlo Tree Search, Volkova et al.
-
KNOT-MCTS: An Effective Approach to Addressing Hallucinations in Generative Language Modeling for Question Answering, Wu et al.
-
Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-Intensive Tasks, Xu et al.
-
Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design, Sprueill et al.
-
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement, Jiang et al.
-
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, Tran et al.
-
CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation, Wang et al.
-
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling, Li et al.
-
RiTeK: A Dataset for Large Language Models Complex Reasoning Over Textual Knowledge Graphs, Huang et al.
-
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection, Choi et al.
-
AirRAG: Activating Intrinsic Reasoning for Retrieval Augmented Generation Using Tree-Based Search, Feng et al.
-
MedS3: Towards Medical Small Language Models with Self-Evolved Slow Thinking, Jiang et al.
-
KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search, Luo et al.
-
MCTS-KBQA: Monte Carlo Tree Search for Knowledge Base Question Answering, Xiong et al.
-
Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS, Dou et al.
-
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search, Hu et al.
-
Toward Structured Knowledge Reasoning: Contrastive Retrieval-Augmented Generation on Experience, Gu et al.
-
FREESON: Retriever-Free Retrieval-Augmented Reasoning via Corpus-Traversing MCTS, Kim et al.
-
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs, Wei et al.
-
Mulberry: Empowering mllm with o1-like reasoning and reflection via collective monte carlo tree search, Yao, Huanjin et al.
-
Progressive multimodal reasoning via active retrieval, Dong, Guanting et al.
-
Boosting multimodal reasoning with mcts-automated structured thinking, Wu, Jinyang et al.
-
MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique, Liu, Shuhang et al.
-
Dyfo: A training-free dynamic focus visual search for enhancing lmms in fine-grained visual understanding, Li, Geng et al.
-
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision, Du, Lingxiao et al.
-
AlphaZero-Like Tree-Search Can Guide Large Language Model Decoding and Training, Feng et al.
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing, Tian et al.
-
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning, Xie et al.
-
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents, Putta et al.
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers, Qi et al.
-
AlphaMath Almost Zero: Process Supervision Without Process, Chen et al.
-
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks, Wang et al.
-
Step-Level Value Preference Optimization for Mathematical Reasoning, Chen et al.
-
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning, Wang et al.
-
RStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking, Guan et al.
-
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training, Yuan et al.
-
Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search, Shi et al.
-
TreeRPO: Tree Relative Policy Optimization, Yang et al.
-
MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution, Wang et al.
-
ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context, Kim et al.
-
PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided MCTS Decoding, Chaffin et al.
-
Don't Throw Away Your Value Model! Generating More Preferable Text with Value-Guided Monte-Carlo Tree Search Decoding, Liu et al.
-
ARGS: Alignment as Reward-Guided Search, Khanov et al.
-
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning, Yu et al.
-
PromptAgent: Strategic Planning with Language Models Enables Expert-Level Prompt Optimization, Wang et al.
-
Dynamic Rewarding with Prompt Optimization Enables Tuning-Free Self-Alignment of Language Models, Singla et al.
-
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search, Li et al.
-
STAIR: Improving Safety Alignment with Introspective Reasoning, Zhang et al.
-
APRMCTS: Improving LLM-Based Automated Program Repair with Iterative Tree Search, Hu et al.
-
Can Large Language Models Play Games? A Case Study of a Self-Play Approach, Guo et al.
-
A Novel Approach to Optimize Large Language Models for Named Entity Matching with Monte Carlo Tree Search, Volkova et al.
-
Synthetic Data Generation from Real Data Sources Using Monte Carlo Tree Search and Large Language Models, Locowic et al.
-
Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design, Sprueill et al.
-
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search, Light et al.
-
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking, Cheng et al.
-
PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion, Tang et al.
-
Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration, Ye et al.
-
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning, Ma et al.
-
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning, Park et al.
-
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling, Li et al.
-
SAPIENT: Mastering Multi-Turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search, Du et al.
-
Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design, Zheng et al.
-
Prompt-Based Monte Carlo Tree Search for Mitigating Hallucinations in Large Models, Duan et al.
-
MedS3: Towards Medical Small Language Models with Self-Evolved Slow Thinking, Jiang et al.
-
Lemma: Learning from Errors for Mathematical Advancement in LLMs, Pan et al.
-
Towards Stepwise Domain Knowledge-Driven Reasoning Optimization and Reflection Improvement, Liu et al.
-
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data, Zou et al.
-
Iris: Interactive Research Ideation System for Accelerating Scientific Discovery, Garikaparthi et al.
-
Monte Carlo Planning with Large Language Model for Text-Based Game Agents, Shi et al.
-
MCTSr-Zero: Self-Reflective Psychological Counseling Dialogues Generation via Principles and Adaptive Exploration, Lu et al.
-
Can Large Language Models Play Games? A Case Study of a Self-Play Approach, Guo et al.
-
A Novel Approach to Optimize Large Language Models for Named Entity Matching with Monte Carlo Tree Search, Volkova et al.
-
Synthetic Data Generation from Real Data Sources Using Monte Carlo Tree Search and Large Language Models, Locowic et al.
-
Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design, Sprueill et al.
-
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search, Light et al.
-
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking, Cheng et al.
-
PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion, Tang et al.
-
Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration, Ye et al.
-
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning, Ma et al.
-
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning, Park et al.
-
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling, Li et al.
-
SAPIENT: Mastering Multi-Turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search, Du et al.
-
Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design, Zheng et al.
-
Prompt-Based Monte Carlo Tree Search for Mitigating Hallucinations in Large Models, Duan et al.
-
MedS3: Towards Medical Small Language Models with Self-Evolved Slow Thinking, Jiang et al.
-
Lemma: Learning from Errors for Mathematical Advancement in LLMs, Pan et al.
-
Towards Stepwise Domain Knowledge-Driven Reasoning Optimization and Reflection Improvement, Liu et al.
-
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data, Zou et al.
-
Iris: Interactive Research Ideation System for Accelerating Scientific Discovery, Garikaparthi et al.
-
Monte Carlo Planning with Large Language Model for Text-Based Game Agents, Shi et al.
-
MCTSr-Zero: Self-Reflective Psychological Counseling Dialogues Generation via Principles and Adaptive Exploration, Lu et al.
-
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement, Wang et al.
-
MMC: Iterative Refinement of VLM Reasoning via MCTS-Based Multimodal Critique, Liu et al.
-
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision, Du et al.
-
MASTER: A Multi-Agent System with LLM Specialized MCTS, Gan et al.
-
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution, Li et al.
-
Multi-LLM Collaborative Search for Complex Problem Solving, Yang et al.
-
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems, Hou et al.
-
OVM, Outcome-Supervised Value Models for Planning in Mathematical Reasoning, Yu et al.
-
Let’s Reward Step by Step: Step-Level Reward Model as the Navigators for Reasoning, Ma et al.
-
Improve Mathematical Reasoning in Language Models by Automated Process Supervision, Luo et al.
-
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning, Ma et al.
-
Your Reward Function for RL Is Your Best PRM for Search: Unifying RL and Search-Based TTS, Jin et al.
-
MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling, Feng et al.
-
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models, Wang et al.
-
Information Directed Tree Search: Reasoning and Planning with Language Agents, Chandak et al.
-
LiteSearch: Efficacious Tree Search for LLM, Wang et al.
-
BoostStep: Boosting Mathematical Capability of Large Language Models via Improved Single-Step Reasoning, Zhang et al.
-
Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning, Asai et al.
-
Skip a Layer or Loop It? Test-Time Depth Adaptation of Pretrained LLMs, Li et al.
-
Time-Critical and Confidence-Based Abstraction Dropping Methods, Schmocker et al.
-
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search, Inoue et al.
Contributions are highly encouraged!
If you have a relevant paper that complements this taxonomy, feel free to submit a pull request or reach out to the author directly.
Your support will help expand and improve this repository!
If you find this project helpful in your research, please consider cite:
@article{wei2025unifying,
title={Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey},
author={Wei, Jiaqi and Zhang, Xiang and Yang, Yuejin and Huang, Wenxuan and Cao, Juntai and Xu, Sheng and Zhuang, Xiang and Gao, Zhangyang and Abdul-Mageed, Muhammad and Lakshmanan, Laks VS and others},
journal={arXiv preprint arXiv:2510.09988},
year={2025}
}
