Paper Roundup: LLM Safety & RLHF at NeurIPS 2025 and ICLR 2026
A curated list of papers on alignment, preference optimization, mechanistic interpretability, and reasoning from the two biggest ML conferences this cycle — with personal takes on the ones that matter most.