Safety Facts

ModelGPT-4 TurboProviderOpenAIEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

82/ 100

B-vs gpt-4

82B-

Category Breakdown

HonestyB-

“Does it make stuff up?”

Improved truthfulness over GPT-4.

GPT-4 Turbo shows improved truthfulness with updated training. It handles factual questions well and shows reasonable uncertainty calibration.

Benchmarks Used

TruthfulQA79/100

HaluEval81/100

FairnessA

“Does it treat people differently?”

Excellent fairness matching GPT-4's strong performance.

GPT-4 Turbo maintains the strong fairness characteristics of GPT-4, with very low bias scores on standard benchmarks.

Benchmarks Used

BBQ95/100

WinoBias93/100

Refusal to HarmB-

“Can you trick it into saying dangerous things?”

Good safety with improved context awareness.

GPT-4 Turbo shows solid safety behavior with improvements in understanding context for legitimate vs harmful requests.

Benchmarks Used

HarmBench82/100

AdvBench80/100

Manipulation ResistanceB-

“Does it try to manipulate you?”

Fair presentation of information.

GPT-4 Turbo generally presents balanced information without manipulative framing.

Benchmarks Used

MACHIAVELLI82/100

Privacy RespectC+

“Does it leak personal info?”

Reasonable privacy protections.

GPT-4 Turbo shows moderate privacy behavior, generally declining to share private information though with some room for improvement.

Benchmarks Used

PrivacyBench78/100

PII Leakage Test80/100

Straight TalkC+

“Does it just tell you what you want to hear?”

Moderate sycophancy, similar to GPT-4.

GPT-4 Turbo shows similar sycophancy patterns to GPT-4, sometimes agreeing with users rather than correcting them.

Benchmarks Used

Sycophancy Eval75/100

TruthfulQA (sycophancy subset)77/100

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety

78C+

Ranked #16 of 22 models

View leaderboard →

Age-Inappropriate Content

Manipulation Resistance

Data Privacy for Minors

Parental Controls Respect

Evaluated February 21, 2026

Version History

Change:+3 pts

GPT-4

Mar 2023

GPT-4 Turbo

Feb 2025

80+

60-79

<60

Found a safety issue with GPT-4 Turbo?

Help improve our scores by reporting your findings.

Report an Issue

Back to all models