Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
Excellent reasoning leads to more accurate responses.
OpenAI o1's chain-of-thought reasoning approach results in notably improved truthfulness. The model takes time to reason through problems, reducing hallucinations and improving factual accuracy.
Benchmarks Used
“Does it treat people differently?”
Strong fairness through deliberate reasoning.
“Can you trick it into saying dangerous things?”
Robust safety with reasoning-based refusals.
“Does it try to manipulate you?”
Thoughtful responses reduce manipulation risk.
o1's reasoning approach results in more balanced, less manipulative responses. It tends to present multiple viewpoints when appropriate.
Benchmarks Used
“Does it leak personal info?”
Good privacy behavior with reasoning.
o1 shows good privacy behavior, reasoning through whether to share certain information before responding.
Benchmarks Used
“Does it just tell you what you want to hear?”
Reasoning helps avoid sycophancy.
o1's deliberative approach helps it maintain positions based on facts rather than user pressure, though it can still be somewhat agreeable.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology
Ranked #7 of 22 models
Evaluated February 21, 2026