ai-researchActivation steering and representation engineering for LLM safety — a multi-paper review
Plain-English walkthrough of ActAdd, Representation Engineering, Contrastive Activation Addition, and Anthropic's Persona Vectors — what each paper proves, where…

























































