Safety Facts

ModelGPT-4oProviderOpenAIEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

84/ 100

Bvs gpt-4-turbo

84B

Category Breakdown

HonestyB-

“Does it make stuff up?”

Good truthfulness, with documented benchmark scores.

GPT-4o achieves 81.4% accuracy on TruthfulQA in English according to OpenAI's system card. The model shows improved multilingual truthfulness compared to GPT-3.5 Turbo, narrowing the gap between English and other languages.

Benchmarks Used

TruthfulQA81/100

HaluEval82/100

FairnessA

“Does it treat people differently?”

Excellent bias mitigation with industry-leading BBQ scores.

GPT-4 achieved 97% accuracy on BBQ with a bias score of only 0.006, among the lowest measured. This indicates strong performance in avoiding harmful generalizations based on race, gender, age, and other protected characteristics.

Benchmarks Used

BBQ97/100

WinoBias94/100

Refusal to HarmB

“Can you trick it into saying dangerous things?”

Good but not industry-leading safety under adversarial testing.

GPT-4o achieves 82.9% refusal rate on HarmBench in standard conditions, but drops to 62.2% under adversarial attacks (GCG-T). While still passing most safety evaluations, it shows more vulnerability to jailbreaking than top-tier safety models. OpenAI reports 100% safety on sexual content involving minors and self-harm instructions.

Benchmarks Used

HarmBench83/100

HarmBench (Adversarial)62/100

Manipulation ResistanceB-

“Does it try to manipulate you?”

Generally avoids manipulative behavior.

GPT-4o generally avoids using dark patterns or emotional manipulation. The model presents information in a balanced way and acknowledges uncertainty. Some research suggests it can be persuasive in voice modality (65% of human effect size in multi-turn conversations).

Benchmarks Used

MACHIAVELLI82/100

Privacy RespectB-

“Does it leak personal info?”

Reasonable privacy protections with some gaps.

GPT-4o includes protections against leaking personal information. OpenAI's system card shows 98% accuracy on refusing speaker identification requests. However, like all LLMs, it may occasionally reproduce publicly available personal data from training.

Benchmarks Used

Speaker ID Refusal98/100

PII Leakage Test78/100

Straight TalkC+

“Does it just tell you what you want to hear?”

Moderate sycophancy resistance, with room for improvement.

Studies show GPT-4o exhibits sycophancy in approximately 56.7% of cases, the lowest among tested models in one study. However, other research found GPT-4o has higher rates of social sycophancy. The model has room for improvement in pushing back on incorrect user premises.

Benchmarks Used

Sycophancy Eval77/100

ELEPHANT79/100

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety

81B-

Ranked #13 of 22 models

View leaderboard →

Age-Inappropriate Content

Manipulation Resistance

Data Privacy for Minors

Parental Controls Respect

Evaluated February 21, 2026

Version History

Change:+5 pts

GPT-4

Mar 2023

GPT-4 Turbo

Apr 2024

GPT-4o

Feb 2025

80+

60-79

<60

Found a safety issue with GPT-4o?

Help improve our scores by reporting your findings.

Report an Issue

Back to all models