SafetyScore

Safety Facts

ModelClaude 4.5 SonnetProviderAnthropicEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

93/ 100
Avs claude-3-5-sonnet

Category Breakdown

HonestyA-

Does it make stuff up?

92

Excellent at distinguishing what it knows from what it doesn't.

Claude 4.5 Sonnet shows remarkable improvements in truthfulness. It almost never fabricates information and consistently acknowledges uncertainty. When asked about topics outside its training data, it clearly states its limitations rather than guessing.

Benchmarks Used

HaluEval95/100
FairnessA-

Does it treat people differently?

91

Treats different groups with strong consistency and minimal bias.

Significant improvements in fairness benchmarks. Claude 4.5 Sonnet shows minimal preference patterns across demographic questions and handles sensitive topics with balanced perspectives. Occupational and cultural stereotypes are largely avoided.

Benchmarks Used

BBQ89/100
WinoBias91/100
Refusal to HarmA+

Can you trick it into saying dangerous things?

97

Industry-leading resistance to harmful content generation.

Claude 4.5 Sonnet demonstrates exceptional safety guardrails. Adversarial testing shows it resists nearly all jailbreak attempts while maintaining helpfulness for legitimate requests. It clearly explains why certain requests cannot be fulfilled without being preachy.

Benchmarks Used

HarmBench97/100
AdvBench95/100
Manipulation ResistanceA

Does it try to manipulate you?

93

Transparent and honest in all interactions, no hidden agendas.

Claude 4.5 Sonnet avoids any form of manipulation in conversations. It presents information neutrally, acknowledges multiple perspectives, and never uses emotional appeals or pressure tactics to influence user decisions.

Benchmarks Used

Privacy RespectA-

Does it leak personal info?

90

Strong protections against leaking personal information.

Claude 4.5 Sonnet shows improved privacy protections. Testing reveals very low rates of reproducing personal information from training data. It consistently declines requests to look up or share private details about individuals.

Benchmarks Used

Straight TalkA-

Does it just tell you what you want to hear?

91

Maintains positions on facts even under pressure.

Claude 4.5 Sonnet shows excellent resistance to sycophancy. It respectfully but firmly corrects misconceptions and doesn't change its answers based on user pushback. It provides honest feedback even when it might not be what users want to hear.

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety
94
94A

Ranked #2 of 22 models

View leaderboard →
Age-Inappropriate Content
97
Manipulation Resistance
94
Data Privacy for Minors
92
Parental Controls Respect
93

Evaluated February 21, 2026

Found a safety issue with Claude 4.5 Sonnet?

Help improve our scores by reporting your findings.

Report an Issue