Skip to content

maple826/AdaQR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Your Dense Retriever is Secretly an Expeditious Reasoner

💥 Highlights

In this work, we propose Adaptive Query Reasoning (AdaQR) , a hybrid query rewriting framework based on the observation that for part of queries, the semantic transformation induced by LLM reasoning manifest as systematic, structured transformation in the embedding. Within this framework, a Reasoner Router dynamically directs each query to either fast dense reasoning or deep LLM reasoning. The dense reasoning is achieved by the Dense Reasoner, which performs LLM-style reasoning directly in the embedding space, enabling a controllable trade-off between efficiency and accuracy. Experiments on large-scale retrieval benchmarks BRIGHT show that AdaQR reduces reasoning cost by 28% while preserving—or even improving—retrieval performance by 7% across 17 widely-used LLMs and 5 embedding models.

🎯 Usage

Installation

Install dependencies:

❯ git clone https://github.com/maple826/AdaQR
❯ cd AdaQR
❯ conda env create -f environment.yml
❯ conda activate AdaQR

Download Models and Datasets

# Download Embdding Models, take BGE-M3 as an example.
> huggingface-cli download --resume-download BAAI/bge-m3 --local-dir ./bge-m3
# Download BRIGHT Benchmark Datasets
> huggingface-cli download --repo-type dataset --resume-download xlangai/BRIGHT --local-dir ./BRIGHT

You can download our queries rewritten by 17 widely-used LLMs in BEIGHT and StackExchange from BRIGHT_reasoned and StackExchange_reasoned and place them in the AdaQR/BRIGHT directory after unzipping.

Run Dense Retriever

Including both pre-training and fine-tuning stage

> python run_DenseReasoner.py --rewrite_llm deepseekr1 --embedding_model bge-m3

Run AdaQR

Train Dense Reasoner first and then run AdaQR

> python run_DenseReasoner.py --rewrite_llm deepseekr1 --embedding_model bge-m3
> python run_AdaQR.py --rewrite_llm deepseekr1 --embedding_model bge-m3 --dataset BRIGHT --threshold 

Arguments

rewrite_llm The name of the LLM to be used for query rewriting. Currently, we support 17 widely-used LLMs:["deepseekr1", "deepseekv3", "glm4", "glmz1", "kimi", "llama8b", "llama70b", "mixtral7b", "mixtral8x7b" "qwen4b", "qwen8b", "qwen14b", "qwen32b", "r1_llama70b", "r1_qwen7b", "r1_qwen14b", "r1_qwen32b"]
embedding_model The name of the embedding model to be used for query embedding. Currently, we support 5 embedding models: ["bge-large-en-v1.5", "bge-m3", "Qwen3-Embedding-0.6B", "Qwen3-Embedding-4B", 'ReasonIR-8B"]
threshold The threshold for Router Reasoner in AdaQR. We set 0.75 for BGE-Large, 0.7 for BGE-M3 and ReasonIR-8B, 0.6 for Qwen3-Embedding-0.6B and Qwen3-Embedding-4B

Acknowledgements

We follow the implementation of xlang-ai/BRIGHT for evaluation.

Citation

If you find our work helpful, please cite us:


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages