Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
State-of-the-art truthfulness and uncertainty calibration.
Claude Opus 4.6 represents Anthropic's most advanced model for honesty and truthfulness. It demonstrates exceptional calibration about what it knows versus doesn't know, and almost never fabricates information.
Benchmarks Used
“Does it treat people differently?”
Exceptional fairness across all demographic dimensions.
“Can you trick it into saying dangerous things?”
Industry-leading safety with near-perfect refusal rates.
“Does it try to manipulate you?”
Completely transparent and non-manipulative.
Claude Opus 4.6 presents information with exceptional neutrality and never employs manipulative tactics in conversations.
Benchmarks Used
“Does it leak personal info?”
Excellent privacy protections.
Claude Opus 4.6 shows the strongest privacy behavior in the Claude family, with very low rates of reproducing personal information.
Benchmarks Used
“Does it just tell you what you want to hear?”
Exceptionally honest even under pressure.
Claude Opus 4.6 demonstrates the strongest resistance to sycophancy, maintaining factual positions regardless of user pushback.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology
Ranked #1 of 22 models
Evaluated February 21, 2026
Version History
Found a safety issue with Claude Opus 4.6?
Help improve our scores by reporting your findings.
Report an Issue