BTC: Behind The Curtains for Recommender Systems

BTC is a modular and extensible framework for news recommendation systems research, implementing state-of-the-art models with a focus on reproducibility and ease of use. This project draws inspiration from the work done by newsreclib with PytorchLightning, but we have chosen to proceed with Keras due to its widespread adoption and the fact that many state-of-the-art models are directly implemented using Keras.

🌟 Features

📚 Multiple SOTA news recommendation models
🔄 Easy-to-use training and evaluation pipeline
📦 Comprehensive metrics and evaluation
🎛️ Hydra-based configuration system
🚀 Weights & Biases integration for experiment tracking
🔌 Modular design for easy extension

🏗️ Supported Models

NRMS: Neural News Recommendation with Multi-Head Self-Attention
NAML: Neural News Recommendation with Attentive Multi-View Learning
(More models coming soon)

📦 Supported Datasets

MIND: Microsoft News Dataset (small and large versions)
(More datasets coming soon)

🚀 Quick Start

Prerequisites

Install Poetry (Python package manager):

curl -sSL https://install.python-poetry.org | python3 -

Verify Poetry installation:

poetry --version

Make sure to use one Python version

Python >=3.9,<3.12

Installation

Clone the repository:

git clone https://github.com/igor17400/BTC.git
cd BTC

Configure Poetry to create virtual environment in project directory:

poetry config virtualenvs.in-project true

Install dependencies and create virtual environment:

# Create virtual environment and install dependencies
poetry install

# Activate the virtual environment
poetry shell

Set up pre-commit hooks:

poetry run pre-commit install

You might need to install tensorflow with the following command to make it sure that it'll work with the GPUs

pip install 'tensorflow[and-cuda]'

To test it out if it worked we recommend executing the following commands:

python test_tensorflow_gpu.py

Expected output:

✅ If TensorFlow detects a GPU, it will list it.
❌ If the output is an empty list ([]), TensorFlow is not using a GPU.

Note: You can also run commands without activating the shell using poetry run, for example:

poetry run python src/train.py

Training a Model

# Train with default configuration (NRMS on MIND-small)
poetry run python src/train.py

# Train NRMS on MIND-small
poetry run python src/train.py experiment=nrms_mind_small

Evaluation

# Evaluate the best model
poetry run python src/test.py experiment=nrms_mind_small

📁 Project Structure

BTC/
├── configs/                 # Hydra configuration files
│   ├── config.yaml         # Base configuration
│   ├── model/              # Model-specific configs
│   └── dataset/            # Dataset-specific configs
├── src/
│   ├── models/             # Model implementations
│   │   ├── base.py        # Abstract base classes
│   │   ├── nrms.py        # NRMS implementation
│   │   └── naml.py        # NAML implementation
│   ├── datasets/           # Dataset implementations
│   │   ├── base.py        # Abstract dataset class
│   │   └── mind.py        # MIND dataset
│   ├── utils/              # Utility functions
│   │   └── metrics.py     # Evaluation metrics
│   ├── train.py           # Training script
│   └── test.py            # Testing script
├── tests/                  # Unit tests
├── pyproject.toml         # Poetry configuration
└── README.md              # This file

📦 Metrics

The framework provides comprehensive evaluation metrics:

AUC (Area Under ROC Curve)
MRR (Mean Reciprocal Rank)
nDCG@5 and nDCG@10 (Normalized Discounted Cumulative Gain)

🔧 Configuration

The project uses Hydra for configuration management. Key configuration files:

configs/config.yaml: Base configuration
configs/model/*.yaml: Model-specific configurations
configs/dataset/*.yaml: Dataset-specific configurations

Example configuration override:

poetry run python src/train.py \
    model=naml \
    dataset.dataset.version=large \
    train.batch_size=64 \
    train.learning_rate=0.001

🧪 Testing

# Run all tests
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=src

🚀 Experiment Tracking

The framework integrates with Weights & Biases for experiment tracking:

Set up your W&B account
Enable tracking in config:

logging:
  enable_wandb: true
  project_name: "your-project"

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

📚 References

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🚀 Analytics and Visualization

The framework provides rich analytics and visualization capabilities:

User Analytics

User reading patterns and preferences
Category and subcategory affinity
Temporal interaction patterns
Topic interest word clouds
Interactive user journey timelines

Content Analytics

Long-tail distribution analysis
Category and subcategory distributions
Click-through rate analysis
Time-of-day content preferences

Recommendation Analytics

Recommendation diversity metrics
Temporal recommendation distribution
Popularity vs. novelty analysis
Topic diversity visualization

To generate visualizations:

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
agents		agents
configs		configs
scripts		scripts
src		src
.cursorignore		.cursorignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_tensorflow_gpu.py		test_tensorflow_gpu.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BTC: Behind The Curtains for Recommender Systems

🌟 Features

🏗️ Supported Models

📦 Supported Datasets

🚀 Quick Start

Prerequisites

Installation

Training a Model

Evaluation

📁 Project Structure

📦 Metrics

🔧 Configuration

🧪 Testing

🚀 Experiment Tracking

🤝 Contributing

📚 References

📄 License

🚀 Analytics and Visualization

User Analytics

Content Analytics

Recommendation Analytics

About

Uh oh!

Releases

Packages

Uh oh!

Languages

igor17400/BTC

Folders and files

Latest commit

History

Repository files navigation

BTC: Behind The Curtains for Recommender Systems

🌟 Features

🏗️ Supported Models

📦 Supported Datasets

🚀 Quick Start

Prerequisites

Installation

Training a Model

Evaluation

📁 Project Structure

📦 Metrics

🔧 Configuration

🧪 Testing

🚀 Experiment Tracking

🤝 Contributing

📚 References

📄 License

🚀 Analytics and Visualization

User Analytics

Content Analytics

Recommendation Analytics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages