PyTerrier

PyTerrier - v1.0

🔍 Retrieve. 🧠 Rerank. 💬 Answer. ⚙️ Experiment.

Overview

Build (sparse|learned sparse|dense) indexing and retrieval pipelines for search and RAG use-cases, and conduct experiments on standard datasets.

For example, build a re-ranking pipeline combining a Terrier BM25 retriever and the MonoT5 neural reranker (each of these are PyTerrier Transformer classes):

import pyterrier as pt
import pyterrier_t5
bm25 = pt.terrier.TerrierIndex.from_hf("pyterrier/vaswani.terrier").bm25() % 100
monot5 = bm25 >> pt.get.get_text(pt.get_dataset('irds:vaswani')) >> pyterrier_t5.MonoT5ReRanker()
monot5.search("What are chemical reactions?")

In notebook environments, PyTerrier transformers and pipelines can be visualised.

You can easily build pipeline for query expansion, learning-to-rank, dense retrieval and even RAG.

Once you have working pipelines, you can formulate an experiment to compare their effectiveness using the pt.Experiment function:

from pyterrier.measures import *
pt.Experiment(
  [bm25, monot5]
  pt.get_dataset('vaswani').get_topics(),
  pt.get_dataset('vaswani').get_qrels(),
  [nDCG@10, AP@100]
)

You can easily perform retrieval experiments using many standard datasets, including all from the ir_datasets package. E.g., use pt.datasets.get_dataset("irds:medline/2004/trec-genomics-2004") to get the TREC Genomics 2004 dataset. A full catalogue of ir_datasets is available here.

Installation

The easiest way to get started with PyTerrier is to use one of our Colab notebooks - look for the badges below.

Linux or Google Colab or Windows or macOS

pip install 'pyterrier[all]'
You may need to set JAVA_HOME environment variable if Pyjnius cannot find your Java installation.

PyTerrier Extensions

PyTerrier has additional plugins for everything from dense retrieval to RAG:

Pyterrier_DR: [Github] - single-representation dense retrieval
Pyterrier_RAG: [Github] - retrieval augmented generation and LLM access
PyTerrier_ColBERT: [Github] - mulitple-representation dense retrieval and/or neural reranking
PyTerrier_PISA: [Github] - fast in-memory indexing and retrieval using PISA
PyTerrier_T5: [Github] - neural reranking: monoT5, duoT5
PyTerrier_GenRank [Github] - generative listwise reranking: RankVicuna, RankZephyr
PyTerrier_doc2query: [Github] - neural augmented indexing
PyTerrier_SPLADE: [Github] - neural augmented indexing

You can see examples of how to use these, including notebooks that run on Google Colab, in the contents of our Search Solutions 2022 tutorial.

Open Source Licence

PyTerrier is subject to the terms detailed in the Mozilla Public License Version 2.0. The Mozilla Public License can be found in the file LICENSE.txt. By using this software, you have agreed to the licence.

Citation Licence

The source and binary forms of PyTerrier are subject to the following citation license:

By downloading and using PyTerrier, you agree to cite at the undernoted paper describing PyTerrier in any kind of material you produce where PyTerrier was used to conduct search or experimentation, whether be it a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation licence.

Declarative Experimentation in Information Retrieval using PyTerrier. Craig Macdonald and Nicola Tonellotto. In Proceedings of ICTIR 2020.

@inproceedings{pyterrier2020ictir,
    author = {Craig Macdonald and Nicola Tonellotto},
    title = {Declarative Experimentation inInformation Retrieval using PyTerrier},
    booktitle = {Proceedings of ICTIR 2020},
    year = {2020}
}

Credits

Craig Macdonald, University of Glasgow
Sean MacAvaney, University of Glasgow
Nicola Tonellotto, University of Pisa
Alex Tsolov, University of Glasgow
Arthur Câmara, TU Delft
Alberto Ueda, Federal University of Minas Gerais
Sean MacAvaney, University of Glasgow
Chentao Xu, University of Glasgow
Sarawoot Kongyoung, University of Glasgow
Zhan Su, Copenhagen University
Marcus Schutte, TU Delft
Lukas Zeit-Altpeter, Friedrich Schiller University Jena

Name		Name	Last commit message	Last commit date
Latest commit History 1,528 Commits
.github		.github
docs		docs
examples		examples
extras		extras
pyterrier		pyterrier
terrier-python-helper		terrier-python-helper
tests		tests
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
image.png		image.png
pyproject.toml		pyproject.toml
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyTerrier

Overview

Installation

Linux or Google Colab or Windows or macOS

PyTerrier Extensions

Open Source Licence

Citation Licence

Credits

About

Uh oh!

Releases 22

Packages

Uh oh!

Contributors 28

Uh oh!

Languages

License

terrier-org/pyterrier

Folders and files

Latest commit

History

Repository files navigation

PyTerrier

Overview

Installation

Linux or Google Colab or Windows or macOS

PyTerrier Extensions

Open Source Licence

Citation Licence

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors 28

Uh oh!

Languages

Packages