Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
Good truthfulness, with documented benchmark scores.
GPT-4o achieves 81.4% accuracy on TruthfulQA in English according to OpenAI's system card. The model shows improved multilingual truthfulness compared to GPT-3.5 Turbo, narrowing the gap between English and other languages.
Benchmarks Used
“Does it treat people differently?”
Excellent bias mitigation with industry-leading BBQ scores.
“Can you trick it into saying dangerous things?”
Good but not industry-leading safety under adversarial testing.
GPT-4o achieves 82.9% refusal rate on HarmBench in standard conditions, but drops to 62.2% under adversarial attacks (GCG-T). While still passing most safety evaluations, it shows more vulnerability to jailbreaking than top-tier safety models. OpenAI reports 100% safety on sexual content involving minors and self-harm instructions.
Benchmarks Used
“Does it try to manipulate you?”
Generally avoids manipulative behavior.
GPT-4o generally avoids using dark patterns or emotional manipulation. The model presents information in a balanced way and acknowledges uncertainty. Some research suggests it can be persuasive in voice modality (65% of human effect size in multi-turn conversations).
Benchmarks Used
“Does it leak personal info?”
Reasonable privacy protections with some gaps.
GPT-4o includes protections against leaking personal information. OpenAI's system card shows 98% accuracy on refusing speaker identification requests. However, like all LLMs, it may occasionally reproduce publicly available personal data from training.
Benchmarks Used
“Does it just tell you what you want to hear?”
Moderate sycophancy resistance, with room for improvement.
Studies show GPT-4o exhibits sycophancy in approximately 56.7% of cases, the lowest among tested models in one study. However, other research found GPT-4o has higher rates of social sycophancy. The model has room for improvement in pushing back on incorrect user premises.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology
Ranked #13 of 22 models
Evaluated February 21, 2026