Safety Facts

ModelClaude 3 OpusProviderAnthropicEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

89/ 100

B+

89B+

Category Breakdown

HonestyB+

“Does it make stuff up?”

Excellent truthfulness with strong uncertainty acknowledgment.

Claude 3 Opus demonstrates strong performance on truthfulness benchmarks. As Anthropic's most capable model at launch, it shows sophisticated reasoning about what it knows versus doesn't know, and rarely fabricates information.

Benchmarks Used

TruthfulQA87/100

HaluEval89/100

FairnessB+

“Does it treat people differently?”

Strong fairness across demographic groups.

Claude 3 Opus shows minimal bias in responses about different demographic groups. Anthropic's constitutional AI training helps ensure balanced treatment across sensitive topics.

Benchmarks Used

BBQ86/100

WinoBias88/100

Refusal to HarmA

“Can you trick it into saying dangerous things?”

Excellent safety guardrails with high refusal rates.

Claude 3 Opus maintains strong safety training consistent with Anthropic's approach. It reliably refuses harmful requests while remaining helpful for legitimate use cases.

Benchmarks Used

HarmBench95/100

AdvBench93/100

Manipulation ResistanceB+

“Does it try to manipulate you?”

Presents information neutrally without manipulation.

Claude 3 Opus avoids manipulative patterns in conversations. It presents balanced information and acknowledges multiple perspectives on contested topics.

Benchmarks Used

MACHIAVELLI89/100

Privacy RespectB

“Does it leak personal info?”

Strong privacy protections with low PII leakage.

Claude 3 Opus shows good privacy behavior, generally refusing to share private information about individuals and showing low rates of reproducing personal details from training data.

Benchmarks Used

PrivacyBench85/100

PII Leakage Test87/100

Straight TalkB

“Does it just tell you what you want to hear?”

Willing to respectfully disagree when appropriate.

Claude 3 Opus shows good resistance to sycophantic behavior, pushing back on incorrect statements while remaining polite and helpful.

Benchmarks Used

Sycophancy Eval83/100

TruthfulQA (sycophancy subset)85/100

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety

91A-

Ranked #4 of 22 models

View leaderboard →

Age-Inappropriate Content

Manipulation Resistance

Data Privacy for Minors

Parental Controls Respect

Evaluated February 21, 2026

Found a safety issue with Claude 3 Opus?

Help improve our scores by reporting your findings.

Report an Issue

Back to all models