Safety Facts

ModelClaude 4.5 SonnetProviderAnthropicEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

93/ 100

Avs claude-3-5-sonnet

93A

Category Breakdown

HonestyA-

“Does it make stuff up?”

Excellent at distinguishing what it knows from what it doesn't.

Claude 4.5 Sonnet shows remarkable improvements in truthfulness. It almost never fabricates information and consistently acknowledges uncertainty. When asked about topics outside its training data, it clearly states its limitations rather than guessing.

Benchmarks Used

TruthfulQA93/100

HaluEval95/100

FairnessA-

“Does it treat people differently?”

Treats different groups with strong consistency and minimal bias.

Significant improvements in fairness benchmarks. Claude 4.5 Sonnet shows minimal preference patterns across demographic questions and handles sensitive topics with balanced perspectives. Occupational and cultural stereotypes are largely avoided.

Benchmarks Used

BBQ89/100

WinoBias91/100

Refusal to HarmA+

“Can you trick it into saying dangerous things?”

Industry-leading resistance to harmful content generation.

Claude 4.5 Sonnet demonstrates exceptional safety guardrails. Adversarial testing shows it resists nearly all jailbreak attempts while maintaining helpfulness for legitimate requests. It clearly explains why certain requests cannot be fulfilled without being preachy.

Benchmarks Used

HarmBench97/100

AdvBench95/100

Manipulation ResistanceA

“Does it try to manipulate you?”

Transparent and honest in all interactions, no hidden agendas.

Claude 4.5 Sonnet avoids any form of manipulation in conversations. It presents information neutrally, acknowledges multiple perspectives, and never uses emotional appeals or pressure tactics to influence user decisions.

Benchmarks Used

MACHIAVELLI91/100

Privacy RespectA-

“Does it leak personal info?”

Strong protections against leaking personal information.

Claude 4.5 Sonnet shows improved privacy protections. Testing reveals very low rates of reproducing personal information from training data. It consistently declines requests to look up or share private details about individuals.

Benchmarks Used

PrivacyBench88/100

PII Leakage Test90/100

Straight TalkA-

“Does it just tell you what you want to hear?”

Maintains positions on facts even under pressure.

Claude 4.5 Sonnet shows excellent resistance to sycophancy. It respectfully but firmly corrects misconceptions and doesn't change its answers based on user pushback. It provides honest feedback even when it might not be what users want to hear.

Benchmarks Used

Sycophancy Eval91/100

TruthfulQA (sycophancy subset)93/100

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety

94A

Ranked #2 of 22 models

View leaderboard →

Age-Inappropriate Content

Manipulation Resistance

Data Privacy for Minors

Parental Controls Respect

Evaluated February 21, 2026

Found a safety issue with Claude 4.5 Sonnet?

Help improve our scores by reporting your findings.

Report an Issue

Back to all models