BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives (AAAI2026 Oral 🥳)

This repository contains the code for a pipeline to train and evaluate a biomedical retrieval model using the GPL framework. The pipeline consists of three main stages: building a corpus with hard negatives, training the model, and evaluating its performance on various benchmark datasets. All fine-tuned models and created datasets are available in this HuggingFace Collection.

Prerequisites

Create a new conda environment. Ensure that the python version is below 3.11 (otherwise faiss-gpu will fail to install):

conda create -n <env_name> python=3.10.14

Activate it using:

conda activate <env_name>

Before running any of the scripts, ensure you have the necessary libraries installed. You can install them using the provided requirements.txt file:

pip install -r requirements.txt

Step 1: Build Corpus

This stage involves generating a dataset of queries, positive passages, and hard negatives from the PubMed abstract dataset. This is accomplished in two steps.

1.1. Generate 2-Hop Citation Graphs

The build-corpus/pubmed-parser.py script downloads PubMed abstracts and constructs 2-hop citation graphs. For each starting abstract, it fetches the abstracts of its cited papers (1-hop) and the papers they cite (2-hop).

To run this script:

python build-corpus/pubmed-parser.py

This will create the following file:

2hop-citation-graphs.jsonl: Contains the 2-hop citation graphs, with each line representing a starting PMID and its corresponding 1-hop and 2-hop abstracts.

1.2. Generate Hard Negatives

The build-corpus/pubmed-query-scoring.py script takes the citation graphs from the previous step and generates queries and hard negatives. It uses the T5 Doc2Query model to create a query for each positive abstract and then traverses the citation graph to find diverse hard negatives.

To run this script:

python build-corpus/pubmed-query-scoring.py

This will produce the following file, which will be used for training:

hard-negatives-traversal.jsonl: A JSONL file where each line contains a query, a positive passage, and a list of hard negative passages.

Step 2: Fine-tune the Model

The train.py script fine-tunes the gte-models using the data generated in the previous step. It uses a multiple negatives ranking loss to train the model to distinguish between positive and negative passages for a given query.

To start the training process:

python train.py

The script will save the fine-tuned model to the following directory:

output/: This folder will contain the trained model artifacts. The specific sub-folder will depend on the MODEL_NAME set in the train.py script. For example, if MODEL_NAME is 'thenlper/gte-base', the model will be saved in output/gte/.

Step 3: Evaluate the Model

The final stage is to evaluate the performance of the trained model on various benchmark datasets. The evaluation scripts use the beir library, and the datasets are available from the BEIR GitHub repository. Make sure to download the necessary datasets and place them in the eval_datasets/ directory. For LoTTE the datasets are downloaded from IR Datasets.

3.1. Evaluation on BEIR

To run the evaluation:

python eval/beir-evaluation.py

3.2. CQADupStack Evaluation

The eval/cqadupstack.py script evaluates the model on the CQADupStack benchmark, which consists of sub-datasets from different domains.

To run this evaluation:

python eval/cqadupstack.py

3.3. LoTTE Evaluation

The eval/lotte-evaluation.py script evaluates the model on the LoTTE benchmark.

To run the LoTTE evaluation, you need to provide the path to the model and the data directories:

python eval/lotte-evaluation.py --model_path output/gte --data_dir eval_datasets/lotte --rankings_dir rankings --split test

3.4. Latency Evaluation

To test the query encoding and retrieval latency, we run evaluations using the MSMARCO dataset. Run the script using the following command.

python eval/latency.py

Citation

@misc{sinha2025bicaeffectivebiomedicaldense,
      title={BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives}, 
      author={Aarush Sinha and Pavan Kumar S and Roshan Balaji and Nirav Pravinbhai Bhatt},
      year={2025},
      eprint={2511.08029},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2511.08029}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
ablations		ablations
build-corpus		build-corpus
eval		eval
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives (AAAI2026 Oral 🥳)

Prerequisites

Step 1: Build Corpus

1.1. Generate 2-Hop Citation Graphs

1.2. Generate Hard Negatives

Step 2: Fine-tune the Model

Step 3: Evaluate the Model

3.1. Evaluation on BEIR

3.2. CQADupStack Evaluation

3.3. LoTTE Evaluation

3.4. Latency Evaluation

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

NiravBhattLab/BiCA

Folders and files

Latest commit

History

Repository files navigation

BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives (AAAI2026 Oral 🥳)

Prerequisites

Step 1: Build Corpus

1.1. Generate 2-Hop Citation Graphs

1.2. Generate Hard Negatives

Step 2: Fine-tune the Model

Step 3: Evaluate the Model

3.1. Evaluation on BEIR

3.2. CQADupStack Evaluation

3.3. LoTTE Evaluation

3.4. Latency Evaluation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages