Skip to content
/ BTC Public

Behind The Curtains is a project for making clear and practical implementation of recommendation systems for news articles

Notifications You must be signed in to change notification settings

igor17400/BTC

Repository files navigation

BTC: Behind The Curtains for Recommender Systems

Python 3.8+ License: MIT Code style: black

BTC is a modular and extensible framework for news recommendation systems research, implementing state-of-the-art models with a focus on reproducibility and ease of use. This project draws inspiration from the work done by newsreclib with PytorchLightning, but we have chosen to proceed with Keras due to its widespread adoption and the fact that many state-of-the-art models are directly implemented using Keras.

🌟 Features

  • 📚 Multiple SOTA news recommendation models
  • 🔄 Easy-to-use training and evaluation pipeline
  • 📦 Comprehensive metrics and evaluation
  • 🎛️ Hydra-based configuration system
  • 🚀 Weights & Biases integration for experiment tracking
  • 🔌 Modular design for easy extension

🏗️ Supported Models

  • NRMS: Neural News Recommendation with Multi-Head Self-Attention
  • NAML: Neural News Recommendation with Attentive Multi-View Learning
  • (More models coming soon)

📦 Supported Datasets

  • MIND: Microsoft News Dataset (small and large versions)
  • (More datasets coming soon)

🚀 Quick Start

Prerequisites

  1. Install Poetry (Python package manager):
curl -sSL https://install.python-poetry.org | python3 -
  1. Verify Poetry installation:
poetry --version
  1. Make sure to use one Python version
Python >=3.9,<3.12

Installation

  1. Clone the repository:
git clone https://github.com/igor17400/BTC.git
cd BTC
  1. Configure Poetry to create virtual environment in project directory:
poetry config virtualenvs.in-project true
  1. Install dependencies and create virtual environment:
# Create virtual environment and install dependencies
poetry install

# Activate the virtual environment
poetry shell
  1. Set up pre-commit hooks:
poetry run pre-commit install
  1. You might need to install tensorflow with the following command to make it sure that it'll work with the GPUs
pip install 'tensorflow[and-cuda]'

To test it out if it worked we recommend executing the following commands:

python test_tensorflow_gpu.py

Expected output:

✅ If TensorFlow detects a GPU, it will list it.
❌ If the output is an empty list ([]), TensorFlow is not using a GPU.

Note: You can also run commands without activating the shell using poetry run, for example:

poetry run python src/train.py

Training a Model

# Train with default configuration (NRMS on MIND-small)
poetry run python src/train.py

# Train NRMS on MIND-small
poetry run python src/train.py experiment=nrms_mind_small

Evaluation

# Evaluate the best model
poetry run python src/test.py experiment=nrms_mind_small

📁 Project Structure

BTC/
├── configs/                 # Hydra configuration files
│   ├── config.yaml         # Base configuration
│   ├── model/              # Model-specific configs
│   └── dataset/            # Dataset-specific configs
├── src/
│   ├── models/             # Model implementations
│   │   ├── base.py        # Abstract base classes
│   │   ├── nrms.py        # NRMS implementation
│   │   └── naml.py        # NAML implementation
│   ├── datasets/           # Dataset implementations
│   │   ├── base.py        # Abstract dataset class
│   │   └── mind.py        # MIND dataset
│   ├── utils/              # Utility functions
│   │   └── metrics.py     # Evaluation metrics
│   ├── train.py           # Training script
│   └── test.py            # Testing script
├── tests/                  # Unit tests
├── pyproject.toml         # Poetry configuration
└── README.md              # This file

📦 Metrics

The framework provides comprehensive evaluation metrics:

  • AUC (Area Under ROC Curve)
  • MRR (Mean Reciprocal Rank)
  • nDCG@5 and nDCG@10 (Normalized Discounted Cumulative Gain)

🔧 Configuration

The project uses Hydra for configuration management. Key configuration files:

  • configs/config.yaml: Base configuration
  • configs/model/*.yaml: Model-specific configurations
  • configs/dataset/*.yaml: Dataset-specific configurations

Example configuration override:

poetry run python src/train.py \
    model=naml \
    dataset.dataset.version=large \
    train.batch_size=64 \
    train.learning_rate=0.001

🧪 Testing

# Run all tests
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=src

🚀 Experiment Tracking

The framework integrates with Weights & Biases for experiment tracking:

  1. Set up your W&B account
  2. Enable tracking in config:
logging:
  enable_wandb: true
  project_name: "your-project"

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

📚 References

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🚀 Analytics and Visualization

The framework provides rich analytics and visualization capabilities:

User Analytics

  • User reading patterns and preferences
  • Category and subcategory affinity
  • Temporal interaction patterns
  • Topic interest word clouds
  • Interactive user journey timelines

Content Analytics

  • Long-tail distribution analysis
  • Category and subcategory distributions
  • Click-through rate analysis
  • Time-of-day content preferences

Recommendation Analytics

  • Recommendation diversity metrics
  • Temporal recommendation distribution
  • Popularity vs. novelty analysis
  • Topic diversity visualization

To generate visualizations:

About

Behind The Curtains is a project for making clear and practical implementation of recommendation systems for news articles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages