My Learning Notes

My Learning Notes https://learning-notes-dz2.pages.dev/ Recent content on My Learning Notes Hugo -- 0.124.0 en Tue, 16 Jun 2026 03:17:43 +0000 Paper Roundup: LLM Safety & RLHF at NeurIPS 2025 and ICLR 2026 https://learning-notes-dz2.pages.dev/posts/2026-04-29/ Wed, 29 Apr 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-04-29/ A curated list of papers on alignment, preference optimization, mechanistic interpretability, and reasoning from the two biggest ML conferences this cycle — with personal takes on the ones that matter most. Ara: What If Research Papers Were Executable? https://learning-notes-dz2.pages.dev/posts/2026-04-28/ Tue, 28 Apr 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-04-28/ A deep look at Agent-Native Research Artifacts (Ara) — a proposed replacement for academic PDFs that packages research as machine-executable knowledge bundles. What it gets right, what it gets wrong, and why it matters for AI-assisted research. Pluralistic Alignment: One Model, Many Values https://learning-notes-dz2.pages.dev/posts/2026-04-15/ Wed, 15 Apr 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-04-15/ RLHF optimizes for an average human preference — but humans disagree. The Artificial Hivemind problem, counterfactual alignment, and why one-size-fits-all safety is a design choice we should question. Sparse Autoencoders: The Swiss Army Knife of Interpretability https://learning-notes-dz2.pages.dev/posts/2026-04-08/ Wed, 08 Apr 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-04-08/ SAEs went from niche interpretability tool to dominant research theme in one year. Where they’re being applied, what they reveal, and the fundamental limitations nobody has solved yet. SafeDPO and Friends: Preference Optimization That Doesn't Sacrifice Safety https://learning-notes-dz2.pages.dev/posts/2026-03-30/ Mon, 30 Mar 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-03-30/ DPO has problems — preference reversals, reward degradation, and a safety-helpfulness trade-off. Here’s how SafeDPO, RePO, and other recent variants are fixing them. Does RL Actually Make LLMs Reason Better? https://learning-notes-dz2.pages.dev/posts/2026-03-28/ Sat, 28 Mar 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-03-28/ The evidence is more complicated than the hype suggests. RL improves sampling efficiency but may not expand reasoning capacity — and longer chains of thought don’t always help. RLHF Is Just Divergence Estimation in Disguise https://learning-notes-dz2.pages.dev/posts/2026-03-22/ Sun, 22 Mar 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-03-22/ A unifying view of RLHF, DPO, and Constitutional AI — they’re all estimating the divergence between safe and unsafe output distributions. Plus a clean derivation of why DPO works. From Policy Gradient to PPO — Part 2: Trust Regions, PPO, and GRPO https://learning-notes-dz2.pages.dev/posts/2026-03-18/ Wed, 18 Mar 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-03-18/ How trust regions stabilize policy optimization, why PPO became the default for RLHF, and how GRPO eliminates the critic entirely. From Policy Gradient to PPO — Part 1: Foundations https://learning-notes-dz2.pages.dev/posts/2026-03-05/ Thu, 05 Mar 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-03-05/ MDPs, value functions, the REINFORCE algorithm, actor-critic methods, and generalized advantage estimation — the RL foundations you need before understanding RLHF. VAE Variants and Modern Interpretations https://learning-notes-dz2.pages.dev/posts/2026-02-25/ Wed, 25 Feb 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-02-25/ A survey of where the VAE idea went after 2014 — VQ-VAE, hierarchical VAEs, adversarial hybrids, flow-based posteriors — and what the VAE really gave us beyond a specific architecture. Transformers from First Principles — Part 2: What Scale Reveals https://learning-notes-dz2.pages.dev/posts/2026-02-20/ Fri, 20 Feb 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-02-20/ Sparse attention patterns, head specialization, rotary embeddings, gated attention, and the modern efficiency tricks that make large transformers actually trainable. β-VAE and the Emergence of Disentanglement https://learning-notes-dz2.pages.dev/posts/2026-02-10/ Tue, 10 Feb 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-02-10/ A single Greek letter in front of the KL term changes what the VAE learns. We look at β-VAE as a rate-distortion trade-off, an information bottleneck, and a simple probe into disentangled representations. Transformers from First Principles — Part 1: Attention Is All You Need (Really) https://learning-notes-dz2.pages.dev/posts/2026-02-08/ Sun, 08 Feb 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-02-08/ A first-principles walkthrough of the Transformer — self-attention, positional encoding, multi-head attention — with the math that makes it work. Conditional VAE (CVAE): Learning to Generate with Conditions https://learning-notes-dz2.pages.dev/posts/2026-01-25/ Sun, 25 Jan 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-01-25/ We extend the VAE into a controllable generative model by adding a condition y into every term of the ELBO. Safety Neurons: 5% of Your Model Controls 90% of Safety https://learning-notes-dz2.pages.dev/posts/2026-01-18/ Sun, 18 Jan 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-01-18/ Mechanistic interpretability meets alignment — how researchers found that a tiny fraction of neurons are responsible for almost all safety behavior in LLMs, and what that means. Dissecting the VAE Objective: KL, Reconstruction, and the Reparameterization Trick https://learning-notes-dz2.pages.dev/posts/2026-01-10/ Sat, 10 Jan 2026 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2026-01-10/ We open the ELBO, compute each term, and meet the reparameterization trick — the idea that lets us backpropagate through randomness. A Curated Guide to LLMs, Reinforcement Learning, and AI Safety https://learning-notes-dz2.pages.dev/posts/2025-12-28/ Sun, 28 Dec 2025 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2025-12-28/ Books, papers, conferences, and researchers — a personal resource list for anyone going deep into LLMs, RL, and AI safety. Variational Inference: Cracking the Intractable Integral https://learning-notes-dz2.pages.dev/posts/2025-12-20/ Sat, 20 Dec 2025 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2025-12-20/ Variational Inference transforms the impossible task of computing intractable integrals into a solvable optimization problem, providing the mathematical foundation for modern generative models like VAEs. Latent Variable Models: A Probabilistic Foundation https://learning-notes-dz2.pages.dev/posts/2025-10-28/ Tue, 28 Oct 2025 00:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2025-10-28/ From PCA to Probabilistic PCA and general Latent Variable Models: the probabilistic lens that seeds VAEs. An overview on generative models paradigms https://learning-notes-dz2.pages.dev/posts/2024-12-24/ Tue, 24 Dec 2024 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2024-12-24/ A summary of explicit, implicit and score-based generative models. Information Theory https://learning-notes-dz2.pages.dev/posts/2024-10-05/ Sat, 05 Oct 2024 11:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2024-10-05/ Information theory essentials: entropy, cross-entropy, joint/conditional entropy, KL divergence, mutual information. The Curse of Dimensionality and Decision Theory https://learning-notes-dz2.pages.dev/posts/2024-09-02/ Mon, 02 Sep 2024 11:00:00 +0700 https://learning-notes-dz2.pages.dev/posts/2024-09-02/ High-dimensional data pitfalls (CoD) and core decision theory: risk, posterior-based rules, reject option. Transformations of random variables https://learning-notes-dz2.pages.dev/posts/2024-08-15/ Thu, 15 Aug 2024 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2024-08-15/ Change-of-variables for PDFs: scalar and multivariate cases, Jacobian determinant, convolution and CLT. Bayesian Probability https://learning-notes-dz2.pages.dev/posts/2024-07-21/ Sun, 21 Jul 2024 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2024-07-21/ Bayesian probability: quantifying uncertainty, Bayes’ rule, prior/likelihood/posterior, marginal probability. Basic Probability https://learning-notes-dz2.pages.dev/posts/2024-07-12/ Fri, 12 Jul 2024 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2024-07-12/ Probability fundamentals: rules, PDFs, expectation, variance, covariance, Gaussian distribution. Polynomial curve fitting https://learning-notes-dz2.pages.dev/posts/2024-07-05/ Fri, 05 Jul 2024 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2024-07-05/ Polynomial regression from least squares to Bayesian view: closed-form, regularization, predictive uncertainty. Diffusion Models https://learning-notes-dz2.pages.dev/posts/2024-06-11/ Tue, 11 Jun 2024 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2024-06-11/ Diffusion Models (DMs) include two processes: forward and backward. Forward process General idea Degrading input data using noise iteratively, forward in time (i.e., $t$ increases). Given image $x_0 \sim q(x_0)$, which called data distribution, forward process gradually adds Gauss noise thru $T$ time steps and produces latent $x_T$. At each time step $t$, we sample Gauss noise that following the distribution $\mathcal{N}(\sqrt{1 - \beta_t} x_{t-1}, \beta_t)$, where the hyper-parameters $0 < \beta_{1:T} < 1$ represent the variance of noise incorporated at each time step. Determinant of matrices, eigenvalues and eigenvectors https://learning-notes-dz2.pages.dev/posts/2021-08-21/ Sat, 21 Aug 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-08-21/ Determinants, eigenvalues, eigenvectors: geometric meaning, finding methods, and linear transformation essence. Span, basis, and dimension https://learning-notes-dz2.pages.dev/posts/2021-08-07/ Sat, 07 Aug 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-08-07/ Linear independence, span, basis, dimension: fundamental concepts for vector spaces and subspaces. The four fundamental subspaces in Linear Algebra https://learning-notes-dz2.pages.dev/posts/2021-08-14/ Sat, 07 Aug 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-08-14/ "This is really the heart of this approach to linear algebra, to see these four subspaces, how they are related." - Prof. Gilbert Strang Solving Ax = b https://learning-notes-dz2.pages.dev/posts/2021-08-01/ Sun, 01 Aug 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-08-01/ Solving Ax=b: conditions for solutions, complete solution (particular + nullspace), rank relationships. Nullspace and solving Ax=0 https://learning-notes-dz2.pages.dev/posts/2021-07-27/ Tue, 27 Jul 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-07-27/ Nullspace and solving Ax=0: special solutions, free variables, reduced row echelon form. Echelon Form and Rank of a matrix https://learning-notes-dz2.pages.dev/posts/2021-07-25/ Sun, 25 Jul 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-07-25/ Echelon form and matrix rank: row elimination, leading elements, and solving linear systems. Vector spaces and subspaces https://learning-notes-dz2.pages.dev/posts/2021-07-23/ Fri, 23 Jul 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-07-23/ Vector spaces, subspaces, column space: 8 axioms, subspace properties, and linear combinations. Vanishing Gradients https://learning-notes-dz2.pages.dev/posts/2021-07-21/ Wed, 21 Jul 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-07-21/ What is vanishing gradients? How come it happens and what is the solution to resolve? Basic concepts in Linear Algebra https://learning-notes-dz2.pages.dev/posts/2021-07-20/ Tue, 20 Jul 2021 01:12:07 +0700 https://learning-notes-dz2.pages.dev/posts/2021-07-20/ Basic concepts of Linear Algebra: data types, notations, and so on