Sparse Autoencoders: The Swiss Army Knife of Interpretability

SAEs went from niche interpretability tool to dominant research theme in one year. Where they’re being applied, what they reveal, and the fundamental limitations nobody has solved yet.

April 8,2026 | Estimated reading time: 6 min | 1233 words | Author: khanhnn

Safety Neurons: 5% of Your Model Controls 90% of Safety

Mechanistic interpretability meets alignment — how researchers found that a tiny fraction of neurons are responsible for almost all safety behavior in LLMs, and what that means.

January 18,2026 | Estimated reading time: 5 min | 1033 words | Author: khanhnn