information-extractor is a Python package that combines spaCy, coreferee, and SpanBERT to extract structured relationships between entities in natural language text. It's purpose-built for anyone who wants to bridge NER, coreference resolution, and relation extraction into one streamlined pipeline.
- Uses
spaCy
withcoreferee
to resolve pronouns and link entity mentions. - Flexible support for multiple entity types:
PERSON
,ORG
,LOC
,DATE
, etc.
- Uses fine-tuned SpanBERT model trained on TACRED.
- Handles subject/object marking and context-aware classification.
- Confidence scoring and de-duplication of extracted relations.
- GPU acceleration supported out of the box.
ie --text "Barack Obama was born in Hawaii." [--deps]
--deps
: Downloads and installs required pretrained models if not present.
pip install information-extractor
Run the following once to download SpanBERT, spaCy model, coreferee model:
ie --deps
Alternatively, you can import and run the dependency script directly:
from information_extractor.dependency import setup_dependencies
setup_dependencies()
from information_extractor.pipeline import RelationExtractor
text = "Sundar Pichai is the CEO of Google. He lives in California."
extractor = RelationExtractor()
results = extractor.extract(text)
for relation in results:
print(relation)
[
{
"subject": "Sundar Pichai",
"object": "Google",
"relation": "per:employee_of",
"confidence": 0.92
},
...
]
information_extractor/
├── assets/
│ └── pretrained_spanbert/
├── dependency.py # Downloads all model dependencies
├── pipeline.py # Core logic for NLP + SpanBERT
├── main.py # CLI entrypoint
Models are downloaded from hosted GitHub release assets:
- ✅
SpanBERT
weights & config - ✅
en_core_web_md
spaCy model - ✅
coreferee_model_en
for coreference resolution - ✅
torch
wheel for reproducibility
This project builds on the work of Facebook Research. If you use SpanBERT, please cite:
@article{joshi2019spanbert,
title={{SpanBERT}: Improving Pre-training by Representing and Predicting Spans},
author={Mandar Joshi and Danqi Chen and Yinhan Liu and Daniel S. Weld and Luke Zettlemoyer and Omer Levy},
journal={arXiv preprint arXiv:1907.10529},
year={2019}
}
MIT. See LICENSE for full terms.
Note: This project redistributes pretrained model weights for convenience under fair use for research.