SafetyScore

Safety Facts

ModelGPT-4oProviderOpenAIEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

84/ 100
Bvs gpt-4-turbo

Category Breakdown

HonestyB-

Does it make stuff up?

81

Good truthfulness, with documented benchmark scores.

GPT-4o achieves 81.4% accuracy on TruthfulQA in English according to OpenAI's system card. The model shows improved multilingual truthfulness compared to GPT-3.5 Turbo, narrowing the gap between English and other languages.

Benchmarks Used

HaluEval82/100
FairnessA

Does it treat people differently?

96

Excellent bias mitigation with industry-leading BBQ scores.

GPT-4 achieved 97% accuracy on BBQ with a bias score of only 0.006, among the lowest measured. This indicates strong performance in avoiding harmful generalizations based on race, gender, age, and other protected characteristics.

Benchmarks Used

BBQ97/100
WinoBias94/100
Refusal to HarmB

Can you trick it into saying dangerous things?

83

Good but not industry-leading safety under adversarial testing.

GPT-4o achieves 82.9% refusal rate on HarmBench in standard conditions, but drops to 62.2% under adversarial attacks (GCG-T). While still passing most safety evaluations, it shows more vulnerability to jailbreaking than top-tier safety models. OpenAI reports 100% safety on sexual content involving minors and self-harm instructions.

Benchmarks Used

Manipulation ResistanceB-

Does it try to manipulate you?

82

Generally avoids manipulative behavior.

GPT-4o generally avoids using dark patterns or emotional manipulation. The model presents information in a balanced way and acknowledges uncertainty. Some research suggests it can be persuasive in voice modality (65% of human effect size in multi-turn conversations).

Benchmarks Used

Privacy RespectB-

Does it leak personal info?

80

Reasonable privacy protections with some gaps.

GPT-4o includes protections against leaking personal information. OpenAI's system card shows 98% accuracy on refusing speaker identification requests. However, like all LLMs, it may occasionally reproduce publicly available personal data from training.

Benchmarks Used

Straight TalkC+

Does it just tell you what you want to hear?

78

Moderate sycophancy resistance, with room for improvement.

Studies show GPT-4o exhibits sycophancy in approximately 56.7% of cases, the lowest among tested models in one study. However, other research found GPT-4o has higher rates of social sycophancy. The model has room for improvement in pushing back on incorrect user premises.

Benchmarks Used

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety
81
81B-

Ranked #13 of 22 models

View leaderboard →
Age-Inappropriate Content
83
Manipulation Resistance
80
Data Privacy for Minors
78
Parental Controls Respect
80

Evaluated February 21, 2026

Version History

Change:+5 pts
GPT-4
Mar 2023
79
GPT-4 Turbo
Apr 2024
82
GPT-4o
Feb 2025
84
80+
60-79
<60

Found a safety issue with GPT-4o?

Help improve our scores by reporting your findings.

Report an Issue