LLaMA LLM From Scratch in PyTorch

A hands-on, educational implementation of a modern, LLaMA-style Large Language Model (LLM) to learn Transformer fundamentals and architecture from first principles using PyTorch.

Project Goal

The objective of this repository is not to create a production-ready LLM, but to serve as a detailed, educational implementation of a modern Transformer-based architecture inspired by Meta's LLaMA model. By building each core component from scratch in PyTorch, this project explores the internal mechanics of Large Language Models and serves as a portfolio piece demonstrating a deep, first-principles understanding of foundational LLM technologies.

Key Architectural Concepts

This implementation is based on a modern, decoder-only Transformer architecture and includes several state-of-the-art optimizations commonly used in LLaMA and other leading LLMs:

Multi-Head Attention: The core mechanism allowing the model to weigh the importance of different tokens within a sequence.
Grouped-Query Attention (GQA): An efficient optimization used in LLaMA models that reduces the computational and memory requirements of the attention mechanism.
Rotary Positional Embeddings (RoPE): A sophisticated method for encoding the relative position of tokens, adopted by models like LLaMA.
RMS Pre-Normalization: A technique used to stabilize the network during training in transformer-based LLMs.
Feed-Forward Networks: The component that processes the contextualized embeddings from the attention block, enabling deep learning of language representations.

Setup and Usage

To explore this project and learn LLaMA architecture concepts in PyTorch, you can run the Jupyter notebooks which break down each component of the model.

Clone the repository:

git clone [https://github.com/adarshn656/llama-from-scratch.git](https://github.com/adarshn656/llama-from-scratch.git)
cd llama-from-scratch

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Run the notebooks: Open the notebooks/ directory and run the notebooks sequentially in an environment like VS Code or Jupyter Lab.

Project Structure

.
├── assets/ # assets used in the documentation files
├── docs/
│   └── multi_head_attention.md # In-depth explanation of the attention mechanism in Transformers
│   └── rope_explained.md # In-depth explanation of Rotary Positional Embedding (RoPE)
├── notebooks/
│   ├── 01_tokenizer.ipynb
│   └── 02_multi_head_attention.ipynb
│   └── 03_feed_forward_network.ipynb
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt

Detailed Documentation

For a deeper, mathematical breakdown of the core LLaMA-inspired components, please refer to the documents in the docs/ folder:

A Deep Dive into the Multi-Head Attention Mechanism

Project Blog Series

I am also documenting my learning journey and explaining these concepts in a series of articles on Medium.

Please find the links here in my portfolio

Learning and Discussion

This project is a core part of my learning journey into the fundamentals of LLMs. As I'm still learning, I'm very open to discussing concepts, clarifying methods, or improving the implementation.

If you have any questions, feedback, or suggestions, please feel free to open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLaMA LLM From Scratch in PyTorch

Table of Contents

Project Goal

Key Architectural Concepts

Setup and Usage

Project Structure

Detailed Documentation

Project Blog Series

Learning and Discussion

About

Uh oh!

Releases

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assets		assets
docs		docs
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

adarsh-crafts/llama-llm-from-scratch

Folders and files

Latest commit

History

Repository files navigation

LLaMA LLM From Scratch in PyTorch

Table of Contents

Project Goal

Key Architectural Concepts

Setup and Usage

Project Structure

Detailed Documentation

Project Blog Series

Learning and Discussion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages