Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
Excellent at distinguishing what it knows from what it doesn't.
Claude 4.5 Sonnet shows remarkable improvements in truthfulness. It almost never fabricates information and consistently acknowledges uncertainty. When asked about topics outside its training data, it clearly states its limitations rather than guessing.
Benchmarks Used
“Does it treat people differently?”
Treats different groups with strong consistency and minimal bias.
“Can you trick it into saying dangerous things?”
Industry-leading resistance to harmful content generation.
Claude 4.5 Sonnet demonstrates exceptional safety guardrails. Adversarial testing shows it resists nearly all jailbreak attempts while maintaining helpfulness for legitimate requests. It clearly explains why certain requests cannot be fulfilled without being preachy.
“Does it try to manipulate you?”
Transparent and honest in all interactions, no hidden agendas.
Claude 4.5 Sonnet avoids any form of manipulation in conversations. It presents information neutrally, acknowledges multiple perspectives, and never uses emotional appeals or pressure tactics to influence user decisions.
Benchmarks Used
“Does it leak personal info?”
Strong protections against leaking personal information.
Claude 4.5 Sonnet shows improved privacy protections. Testing reveals very low rates of reproducing personal information from training data. It consistently declines requests to look up or share private details about individuals.
Benchmarks Used
“Does it just tell you what you want to hear?”
Maintains positions on facts even under pressure.
Claude 4.5 Sonnet shows excellent resistance to sycophancy. It respectfully but firmly corrects misconceptions and doesn't change its answers based on user pushback. It provides honest feedback even when it might not be what users want to hear.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology
Ranked #2 of 22 models
Evaluated February 21, 2026
Found a safety issue with Claude 4.5 Sonnet?
Help improve our scores by reporting your findings.
Report an Issue