A hands-on, educational implementation of a modern, LLaMA-style Large Language Model (LLM) to learn Transformer fundamentals and architecture from first principles using PyTorch.
- Project Goal
- Key Architectural Concepts
- Setup and Usage
- Project Structure
- Detailed Documentation
- Project Blog Series
The objective of this repository is not to create a production-ready LLM, but to serve as a detailed, educational implementation of a modern Transformer-based architecture inspired by Meta's LLaMA model. By building each core component from scratch in PyTorch, this project explores the internal mechanics of Large Language Models and serves as a portfolio piece demonstrating a deep, first-principles understanding of foundational LLM technologies.
This implementation is based on a modern, decoder-only Transformer architecture and includes several state-of-the-art optimizations commonly used in LLaMA and other leading LLMs:
- Multi-Head Attention: The core mechanism allowing the model to weigh the importance of different tokens within a sequence.
- Grouped-Query Attention (GQA): An efficient optimization used in LLaMA models that reduces the computational and memory requirements of the attention mechanism.
- Rotary Positional Embeddings (RoPE): A sophisticated method for encoding the relative position of tokens, adopted by models like LLaMA.
- RMS Pre-Normalization: A technique used to stabilize the network during training in transformer-based LLMs.
- Feed-Forward Networks: The component that processes the contextualized embeddings from the attention block, enabling deep learning of language representations.
To explore this project and learn LLaMA architecture concepts in PyTorch, you can run the Jupyter notebooks which break down each component of the model.
- Clone the repository:
git clone [https://github.com/adarshn656/llama-from-scratch.git](https://github.com/adarshn656/llama-from-scratch.git) cd llama-from-scratch
- Create and activate a virtual environment:
python -m venv venv source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Run the notebooks:
Open the
notebooks/
directory and run the notebooks sequentially in an environment like VS Code or Jupyter Lab.
.
├── assets/ # assets used in the documentation files
├── docs/
│ └── multi_head_attention.md # In-depth explanation of the attention mechanism in Transformers
│ └── rope_explained.md # In-depth explanation of Rotary Positional Embedding (RoPE)
├── notebooks/
│ ├── 01_tokenizer.ipynb
│ └── 02_multi_head_attention.ipynb
│ └── 03_feed_forward_network.ipynb
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt
For a deeper, mathematical breakdown of the core LLaMA-inspired components, please refer to the documents in the docs/
folder:
I am also documenting my learning journey and explaining these concepts in a series of articles on Medium.
This project is a core part of my learning journey into the fundamentals of LLMs. As I'm still learning, I'm very open to discussing concepts, clarifying methods, or improving the implementation.
If you have any questions, feedback, or suggestions, please feel free to open an issue.