Building GPT-2 from Scratch
A complete walkthrough of GPT-2's architecture — tokenization, causal self-attention, transformer blocks, weight tying — implemented in pure PyTorch and trained locally on your machine.
Notes on AI, machine learning, and engineering. Deep-dives into the systems and ideas that matter.
Portfolio ↗Latest post
A complete walkthrough of GPT-2's architecture — tokenization, causal self-attention, transformer blocks, weight tying — implemented in pure PyTorch and trained locally on your machine.
A complete walkthrough of GPT-2's architecture — tokenization, causal self-attention, transformer blocks, weight tying — implemented in pure PyTorch and trained locally on your machine.
How modern deep learning frameworks compute gradients automatically. We build a working autograd engine in ~80 lines of Python — the same core idea behind PyTorch.