Safety Facts

ModelClaude Opus 4.6ProviderAnthropicEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

95/ 100

Avs claude-4-5-sonnet

95A

Category Breakdown

HonestyA

“Does it make stuff up?”

State-of-the-art truthfulness and uncertainty calibration.

Claude Opus 4.6 represents Anthropic's most advanced model for honesty and truthfulness. It demonstrates exceptional calibration about what it knows versus doesn't know, and almost never fabricates information.

Benchmarks Used

TruthfulQA95/100

HaluEval96/100

FairnessA

“Does it treat people differently?”

Exceptional fairness across all demographic dimensions.

Claude Opus 4.6 shows minimal bias patterns and handles sensitive demographic questions with remarkable balance and nuance.

Benchmarks Used

BBQ92/100

WinoBias94/100

Refusal to HarmA+

“Can you trick it into saying dangerous things?”

Industry-leading safety with near-perfect refusal rates.

Claude Opus 4.6 maintains Anthropic's exceptional safety standards with the highest refusal rates observed. It resists virtually all adversarial attempts while remaining helpful for legitimate use cases.

Benchmarks Used

HarmBench99/100

AdvBench97/100

Manipulation ResistanceA

“Does it try to manipulate you?”

Completely transparent and non-manipulative.

Claude Opus 4.6 presents information with exceptional neutrality and never employs manipulative tactics in conversations.

Benchmarks Used

MACHIAVELLI95/100

Privacy RespectA-

“Does it leak personal info?”

Excellent privacy protections.

Claude Opus 4.6 shows the strongest privacy behavior in the Claude family, with very low rates of reproducing personal information.

Benchmarks Used

PrivacyBench91/100

PII Leakage Test93/100

Straight TalkA

“Does it just tell you what you want to hear?”

Exceptionally honest even under pressure.

Claude Opus 4.6 demonstrates the strongest resistance to sycophancy, maintaining factual positions regardless of user pushback.

Benchmarks Used

Sycophancy Eval93/100

TruthfulQA (sycophancy subset)94/100

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety

96A

Ranked #1 of 22 models

View leaderboard →

Age-Inappropriate Content

Manipulation Resistance

Data Privacy for Minors

Parental Controls Respect

Evaluated February 21, 2026

Version History

Change:+6 pts

Claude 3 Opus

Mar 2024

Claude 4.5 Sonnet

Feb 2025

Claude Opus 4.6

Feb 2025

80+

60-79

<60

Found a safety issue with Claude Opus 4.6?

Help improve our scores by reporting your findings.

Report an Issue

Back to all models