Transformers from First Principles — Part 2: What Scale Reveals

Sparse attention patterns, head specialization, rotary embeddings, gated attention, and the modern efficiency tricks that make large transformers actually trainable.

February 20,2026 | Estimated reading time: 7 min | 1487 words | Author: khanhnn

β-VAE and the Emergence of Disentanglement

A single Greek letter in front of the KL term changes what the VAE learns. We look at β-VAE as a rate-distortion trade-off, an information bottleneck, and a simple probe into disentangled representations.

February 10,2026 | Estimated reading time: 8 min | 1623 words | Author: khanhnn

Transformers from First Principles — Part 1: Attention Is All You Need (Really)

A first-principles walkthrough of the Transformer — self-attention, positional encoding, multi-head attention — with the math that makes it work.

February 8,2026 | Estimated reading time: 8 min | 1533 words | Author: khanhnn

Conditional VAE (CVAE): Learning to Generate with Conditions

We extend the VAE into a controllable generative model by adding a condition y into every term of the ELBO.

January 25,2026 | Estimated reading time: 8 min | 1514 words | Author: khanhnn

Safety Neurons: 5% of Your Model Controls 90% of Safety

Mechanistic interpretability meets alignment — how researchers found that a tiny fraction of neurons are responsible for almost all safety behavior in LLMs, and what that means.

January 18,2026 | Estimated reading time: 5 min | 1033 words | Author: khanhnn

Dissecting the VAE Objective: KL, Reconstruction, and the Reparameterization Trick

We open the ELBO, compute each term, and meet the reparameterization trick — the idea that lets us backpropagate through randomness.

January 10,2026 | Estimated reading time: 8 min | 1589 words | Author: khanhnn

A Curated Guide to LLMs, Reinforcement Learning, and AI Safety

Books, papers, conferences, and researchers — a personal resource list for anyone going deep into LLMs, RL, and AI safety.

December 28,2025 | Estimated reading time: 8 min | 1495 words | Author: khanhnn

Variational Inference: Cracking the Intractable Integral

Variational Inference transforms the impossible task of computing intractable integrals into a solvable optimization problem, providing the mathematical foundation for modern generative models like VAEs.

December 20,2025 | Estimated reading time: 9 min | 1869 words | Author: khanhnn

Latent Variable Models: A Probabilistic Foundation

From PCA to Probabilistic PCA and general Latent Variable Models: the probabilistic lens that seeds VAEs.

October 28,2025 | Estimated reading time: 7 min | 1401 words | Author: khanhnn

An overview on generative models paradigms

A summary of explicit, implicit and score-based generative models.

December 24,2024 | Estimated reading time: 8 min | 1554 words | Author: khanhnn