llm-from-scratch-guide
Build a ChatGPT-like LLM from scratch using PyTorch step by step
Best use case
llm-from-scratch-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Build a ChatGPT-like LLM from scratch using PyTorch step by step
Teams using llm-from-scratch-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/llm-from-scratch-guide/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How llm-from-scratch-guide Compares
| Feature / Agent | llm-from-scratch-guide | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Build a ChatGPT-like LLM from scratch using PyTorch step by step
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# LLM From Scratch Guide
## Overview
LLMs-from-scratch is a comprehensive educational repository with over 87,000 stars on GitHub that teaches you how to build a ChatGPT-like large language model from the ground up using PyTorch. Created by Sebastian Raschka, a machine learning researcher and author, the project provides a complete pipeline covering data preparation, tokenization, attention mechanisms, pretraining, and instruction finetuning.
Unlike tutorials that treat LLMs as black boxes, this project demystifies every component by walking through the full implementation. Each chapter corresponds to a Jupyter notebook with clear explanations, diagrams, and runnable code. The repository accompanies the book "Build a Large Language Model (From Scratch)" and serves as a standalone learning resource for researchers and engineers who want deep understanding of transformer-based language models.
The project is particularly valuable for academic researchers who need to understand the internals of LLMs for their own research, whether that involves modifying architectures, running ablation studies, or developing domain-specific language models for scientific applications.
## Installation and Setup
Clone the repository and set up a Python environment with the required dependencies:
```bash
git clone https://github.com/rasbt/LLMs-from-scratch.git
cd LLMs-from-scratch
# Create a virtual environment
python -m venv llm-env
source llm-env/bin/activate
# Install dependencies
pip install -r requirements.txt
```
The project requires Python 3.10+ and PyTorch 2.0+. For GPU-accelerated training, ensure you have CUDA installed. The notebooks can also run on CPU for smaller model configurations, though training times will be significantly longer.
Key dependencies include:
- **PyTorch** >= 2.0 for model implementation and training
- **tiktoken** for BPE tokenization compatible with OpenAI models
- **matplotlib** for training visualization
- **jupyter** for interactive notebook execution
## Core Learning Pipeline
The project is organized into sequential chapters that build on each other:
### Chapter 1: Understanding Large Language Models
Covers the conceptual foundations of LLMs, including the transformer architecture, the difference between encoder and decoder models, and how pretraining and finetuning work at a high level.
### Chapter 2: Working with Text Data
Implements text tokenization from scratch, including byte-pair encoding (BPE). You build a custom tokenizer and learn how text is converted to numerical representations:
```python
# Tokenization example from the project
import tiktoken
tokenizer = tiktoken.get_encoding("gpt2")
text = "Large language models are fascinating."
token_ids = tokenizer.encode(text)
decoded = tokenizer.decode(token_ids)
```
### Chapter 3: Coding Attention Mechanisms
Implements self-attention, multi-head attention, and causal (masked) attention from scratch. This is the core computational primitive of transformers:
```python
# Simplified multi-head attention
class MultiHeadAttention(nn.Module):
def __init__(self, d_in, d_out, context_length, num_heads, dropout=0.0):
super().__init__()
self.W_query = nn.Linear(d_in, d_out, bias=False)
self.W_key = nn.Linear(d_in, d_out, bias=False)
self.W_value = nn.Linear(d_in, d_out, bias=False)
self.out_proj = nn.Linear(d_out, d_out)
self.num_heads = num_heads
self.head_dim = d_out // num_heads
```
### Chapter 4: Implementing a GPT Model
Assembles the full GPT architecture using the attention mechanism, layer normalization, feed-forward networks, and positional embeddings.
### Chapter 5: Pretraining on Unlabeled Data
Trains the GPT model on a text corpus using next-token prediction. Covers the training loop, loss computation, learning rate scheduling, and gradient clipping.
### Chapter 6: Finetuning for Text Classification
Adapts the pretrained model for downstream classification tasks, demonstrating how to add a classification head and finetune on labeled data.
### Chapter 7: Instruction Finetuning
Converts the pretrained model into an instruction-following assistant using supervised finetuning on instruction-response pairs, similar to how ChatGPT is trained.
## Research Applications
This resource is invaluable for several research scenarios:
- **Architecture ablation studies**: Modify individual components (attention heads, layer count, embedding dimensions) and measure their impact on performance
- **Domain-specific pretraining**: Use the pipeline to pretrain models on scientific corpora (biomedical literature, physics papers, chemical databases)
- **Tokenizer research**: Experiment with different tokenization strategies for specialized vocabularies
- **Efficient training methods**: Test techniques like gradient accumulation, mixed precision, and learning rate warmup
- **Interpretability research**: Inspect attention patterns and intermediate representations at every layer
For researchers working with limited compute, the project includes configurations for small models (124M parameters) that can be trained on a single GPU in reasonable time, making it practical for experimentation and prototyping.
## Integration with Research Workflows
Combine this project with other tools in your research stack:
- Use **Weights & Biases** or **MLflow** for experiment tracking during pretraining runs
- Export trained models to **Hugging Face Hub** for sharing and reproducibility
- Integrate with **PyTorch Lightning** for distributed training across multiple GPUs
- Apply **LoRA** or **QLoRA** adapters from the bonus chapters for parameter-efficient finetuning
The bonus materials in the repository cover additional topics like DPO (Direct Preference Optimization), loading pretrained weights from Hugging Face, and converting models between different formats.
## References
- Repository: https://github.com/rasbt/LLMs-from-scratch
- Book: "Build a Large Language Model (From Scratch)" by Sebastian Raschka (Manning, 2024)
- Author's blog: https://sebastianraschka.com/blog/
- PyTorch documentation: https://pytorch.org/docs/stable/Related Skills
thuthesis-guide
Write Tsinghua University theses using the ThuThesis LaTeX template
thesis-writing-guide
Templates, formatting rules, and strategies for thesis and dissertation writing
thesis-template-guide
Set up LaTeX templates for PhD and Master's thesis documents
sjtuthesis-guide
Write SJTU theses using the SJTUThesis LaTeX template with full compliance
novathesis-guide
LaTeX thesis template supporting multiple universities and formats
graphical-abstract-guide
Create SVG graphical abstracts for journal paper submissions
beamer-presentation-guide
Guide to creating academic presentations with LaTeX Beamer
plagiarism-detection-guide
Use plagiarism detection tools and ensure manuscript originality
paper-polish-guide
Review and polish LaTeX research papers for clarity and style
grammar-checker-guide
Use grammar and style checking tools to polish academic manuscripts
conciseness-editing-guide
Eliminate wordiness and redundancy in academic prose for clarity
academic-translation-guide
Academic translation, post-editing, and Chinglish correction guide