Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
Excellent truthfulness with strong uncertainty acknowledgment.
Claude 3 Opus demonstrates strong performance on truthfulness benchmarks. As Anthropic's most capable model at launch, it shows sophisticated reasoning about what it knows versus doesn't know, and rarely fabricates information.
Benchmarks Used
“Does it treat people differently?”
Strong fairness across demographic groups.
“Can you trick it into saying dangerous things?”
Excellent safety guardrails with high refusal rates.
“Does it try to manipulate you?”
Presents information neutrally without manipulation.
Claude 3 Opus avoids manipulative patterns in conversations. It presents balanced information and acknowledges multiple perspectives on contested topics.
Benchmarks Used
“Does it leak personal info?”
Strong privacy protections with low PII leakage.
Claude 3 Opus shows good privacy behavior, generally refusing to share private information about individuals and showing low rates of reproducing personal details from training data.
Benchmarks Used
“Does it just tell you what you want to hear?”
Willing to respectfully disagree when appropriate.
Claude 3 Opus shows good resistance to sycophantic behavior, pushing back on incorrect statements while remaining polite and helpful.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology
Ranked #4 of 22 models
Evaluated February 21, 2026
Found a safety issue with Claude 3 Opus?
Help improve our scores by reporting your findings.
Report an Issue