BTC is a modular and extensible framework for news recommendation systems research, implementing state-of-the-art models with a focus on reproducibility and ease of use. This project draws inspiration from the work done by newsreclib with PytorchLightning, but we have chosen to proceed with Keras due to its widespread adoption and the fact that many state-of-the-art models are directly implemented using Keras.
- 📚 Multiple SOTA news recommendation models
- 🔄 Easy-to-use training and evaluation pipeline
- 📦 Comprehensive metrics and evaluation
- 🎛️ Hydra-based configuration system
- 🚀 Weights & Biases integration for experiment tracking
- 🔌 Modular design for easy extension
- NRMS: Neural News Recommendation with Multi-Head Self-Attention
- NAML: Neural News Recommendation with Attentive Multi-View Learning
- (More models coming soon)
- MIND: Microsoft News Dataset (small and large versions)
- (More datasets coming soon)
- Install Poetry (Python package manager):
curl -sSL https://install.python-poetry.org | python3 -
- Verify Poetry installation:
poetry --version
- Make sure to use one Python version
Python >=3.9,<3.12
- Clone the repository:
git clone https://github.com/igor17400/BTC.git
cd BTC
- Configure Poetry to create virtual environment in project directory:
poetry config virtualenvs.in-project true
- Install dependencies and create virtual environment:
# Create virtual environment and install dependencies
poetry install
# Activate the virtual environment
poetry shell
- Set up pre-commit hooks:
poetry run pre-commit install
- You might need to install tensorflow with the following command to make it sure that it'll work with the GPUs
pip install 'tensorflow[and-cuda]'
To test it out if it worked we recommend executing the following commands:
python test_tensorflow_gpu.py
Expected output:
✅ If TensorFlow detects a GPU, it will list it.
❌ If the output is an empty list ([]), TensorFlow is not using a GPU.
Note: You can also run commands without activating the shell using poetry run
, for example:
poetry run python src/train.py
# Train with default configuration (NRMS on MIND-small)
poetry run python src/train.py
# Train NRMS on MIND-small
poetry run python src/train.py experiment=nrms_mind_small
# Evaluate the best model
poetry run python src/test.py experiment=nrms_mind_small
BTC/
├── configs/ # Hydra configuration files
│ ├── config.yaml # Base configuration
│ ├── model/ # Model-specific configs
│ └── dataset/ # Dataset-specific configs
├── src/
│ ├── models/ # Model implementations
│ │ ├── base.py # Abstract base classes
│ │ ├── nrms.py # NRMS implementation
│ │ └── naml.py # NAML implementation
│ ├── datasets/ # Dataset implementations
│ │ ├── base.py # Abstract dataset class
│ │ └── mind.py # MIND dataset
│ ├── utils/ # Utility functions
│ │ └── metrics.py # Evaluation metrics
│ ├── train.py # Training script
│ └── test.py # Testing script
├── tests/ # Unit tests
├── pyproject.toml # Poetry configuration
└── README.md # This file
The framework provides comprehensive evaluation metrics:
- AUC (Area Under ROC Curve)
- MRR (Mean Reciprocal Rank)
- nDCG@5 and nDCG@10 (Normalized Discounted Cumulative Gain)
The project uses Hydra for configuration management. Key configuration files:
configs/config.yaml
: Base configurationconfigs/model/*.yaml
: Model-specific configurationsconfigs/dataset/*.yaml
: Dataset-specific configurations
Example configuration override:
poetry run python src/train.py \
model=naml \
dataset.dataset.version=large \
train.batch_size=64 \
train.learning_rate=0.001
# Run all tests
poetry run pytest
# Run tests with coverage
poetry run pytest --cov=src
The framework integrates with Weights & Biases for experiment tracking:
- Set up your W&B account
- Enable tracking in config:
logging:
enable_wandb: true
project_name: "your-project"
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
- NRMS: Neural News Recommendation with Multi-Head Self-Attention
- NAML: Neural News Recommendation with Attentive Multi-View Learning
- MIND: MIND: A Large-scale Dataset for News Recommendation
This project is licensed under the MIT License - see the LICENSE file for details.
The framework provides rich analytics and visualization capabilities:
- User reading patterns and preferences
- Category and subcategory affinity
- Temporal interaction patterns
- Topic interest word clouds
- Interactive user journey timelines
- Long-tail distribution analysis
- Category and subcategory distributions
- Click-through rate analysis
- Time-of-day content preferences
- Recommendation diversity metrics
- Temporal recommendation distribution
- Popularity vs. novelty analysis
- Topic diversity visualization
To generate visualizations: