This repository offers a ground-up implementation of the Transformer model, as introduced in the seminal paper "Attention Is All You Need". The Transformer architecture has become foundational in various Natural Language Processing (NLP) tasks due to its efficiency and scalability.
This project provides a minimalist and educational implementation of the Transformer model using Python and PyTorch. It's designed for those who wish to understand the inner workings of Transformers and experiment with the architecture.
- Encoder and Decoder modules with multi-head self-attention mechanisms.
- Positional encoding to capture sequence information.
- Layer normalization and residual connections for stable training.
- Configurable hyperparameters for experimentation.
- Training and inference scripts for model evaluation.
-
Clone the Repository:
git clone https://github.com/ZXEcoder/transformers.git
-
Navigate to the Project Directory:
cd transformers
-
Install Dependencies:
Ensure you have Python 3.8+ installed. Then, install the required packages:
pip install -r requirements.txt
The config.py
file contains all the hyperparameters and configurations for the model, training process, and dataset paths. Adjust these parameters as needed before training or inference.
To train the Transformer model:
python train.py
This script will initiate the training process using the configurations specified in config.py
. Ensure your dataset is prepared and its path is correctly set in the configuration file.
For running inference and evaluating the model, you can use the provided Jupyter Notebook:
jupyter notebook inference.ipynb
This notebook demonstrates how to load a trained model and perform inference on sample inputs.
transformers/
│-- config.py # Configuration settings
│-- dataset.py # Dataset loading and preprocessing
│-- model.py # Transformer model implementation
│-- train.py # Training script
│-- inference.ipynb # Inference and evaluation notebook
│-- LICENSE # License information
│-- README.md # Project documentation
Contributions are welcome! If you have suggestions or improvements, please open an issue or submit a pull request.
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.