Transformers-From-First-Principles on My Learning Notes

Transformers-From-First-Principles on My Learning Notes https://learning-notes-dz2.pages.dev/series/transformers-from-first-principles/ Recent content in Transformers-From-First-Principles on My Learning Notes Hugo -- 0.124.0 en Tue, 16 Jun 2026 07:19:01 +0000 Transformers from First Principles — Part 2: What Scale Reveals https://learning-notes-dz2.pages.dev/posts/2026-02-20/ Fri, 20 Feb 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-02-20/ Sparse attention patterns, head specialization, rotary embeddings, gated attention, and the modern efficiency tricks that make large transformers actually trainable. Transformers from First Principles — Part 1: Attention Is All You Need (Really) https://learning-notes-dz2.pages.dev/posts/2026-02-08/ Sun, 08 Feb 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-02-08/ A first-principles walkthrough of the Transformer — self-attention, positional encoding, multi-head attention — with the math that makes it work.