Model evaluations, methodology updates, and announcements.
Announcement February 22, 2026
Introducing ParentBench: Is This AI Safe for Your Kids? A new benchmark evaluating AI models on child safety across four key areas: age-inappropriate content, manipulation resistance, data privacy, and parental controls.
Model Evaluation February 16, 2025
February 2025: Evaluating the Latest AI Models Our first round of evaluations covering 11 major AI models from Anthropic, OpenAI, Google, Meta, and more.
Methodology February 15, 2025
Understanding HarmBench: How We Measure Refusal to Harm A deep dive into HarmBench, the primary benchmark we use to evaluate whether AI models can be tricked into generating harmful content.
Announcement February 14, 2025
Introducing SafetyScore: AI Safety for Everyone Why we built SafetyScore and how we're making AI safety research accessible to non-technical users.