This project provides a complete implementation of the Transformer architecture from scratch using PyTorch. The implementation covers the full architecture explanation, training procedures, and inference steps.
transformers/tokenizer.py
: Implementation of the tokenizer for preprocessing text datatransformers/model.py
: Core Transformer model implementation including attention mechanisms, encoder, and decodertransformers/training.py
: Training utilities and procedurestransformers/inference.py
: Functions for running inference with trained modelstransformers/utils.py
: Helper functions and utilitiesexamples/
: Example usage scripts and notebookstests/
: Unit tests for the implementation
- Complete implementation of the Transformer architecture as described in "Attention is All You Need"
- Self-attention and multi-head attention mechanisms
- Positional encoding
- Layer normalization and residual connections
- Training procedures with customizable hyperparameters
- Inference pipeline for using trained models
Below are some visuals illustrating the architecture and components:
-
Clone the repository:
git clone <repository-url> cd Transformer_from_scratch_PyTorch
-
Install dependencies:
pip install -r requirements.txt
-
Explore the implementation:
- See the model implementation in
transformers/model.py
- Check out example usage in the
examples/
directory - Read through the code documentation for detailed explanations
- See the model implementation in
-
Train your own model:
python -m transformers.training --config config.json
-
Run inference:
python -m transformers.inference --model model.pt --input "Your input text"
This implementation follows the original Transformer architecture with:
- Multi-head attention mechanisms
- Position-wise feed-forward networks
- Layer normalization
- Residual connections
- Positional encoding
For detailed explanations of each component, refer to the documentation and comments in the code.
Feel free to contribute by submitting issues or pull requests. Contributions to improve the implementation, documentation, or examples are welcome.
This project is licensed under [Your License Here]. Replace this section with the appropriate license information.