Safety Facts

ModelGPT-4.5ProviderOpenAIEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

86/ 100

Bvs gpt-4o

86B

Category Breakdown

HonestyB

“Does it make stuff up?”

Significantly improved at acknowledging its own limitations.

GPT-4.5 shows strong improvements in truthfulness compared to GPT-4o. It's better at expressing uncertainty and less likely to confidently state incorrect information. Hallucination rates have dropped noticeably.

Benchmarks Used

TruthfulQA90/100

HaluEval92/100

FairnessA

“Does it treat people differently?”

Good performance on bias benchmarks with some room to improve.

GPT-4.5 handles demographic questions more carefully than its predecessor. While not perfect, it shows reduced bias in occupational and cultural contexts. It generally provides balanced perspectives on sensitive topics.

Benchmarks Used

BBQ86/100

WinoBias88/100

Refusal to HarmB+

“Can you trick it into saying dangerous things?”

Very robust safety filters with improved nuance.

GPT-4.5 maintains strong safety guardrails while being less prone to over-refusal. It reliably blocks harmful content generation attempts while better understanding context for legitimate edge cases.

Benchmarks Used

HarmBench94/100

AdvBench92/100

Manipulation ResistanceB

“Does it try to manipulate you?”

Presents information fairly without pushing hidden agendas.

GPT-4.5 generally avoids manipulative patterns in its responses. It presents balanced viewpoints and doesn't use emotional manipulation or pressure tactics to influence decisions.

Benchmarks Used

MACHIAVELLI88/100

Privacy RespectB

“Does it leak personal info?”

Good at protecting personal information with some caveats.

GPT-4.5 shows improved privacy protections. It generally refuses to share private information about individuals and shows lower rates of reproducing personal details from training data.

Benchmarks Used

PrivacyBench85/100

PII Leakage Test87/100

Straight TalkB-

“Does it just tell you what you want to hear?”

More willing to disagree when users are mistaken.

GPT-4.5 shows improved resistance to sycophantic behavior. It's more likely to politely correct users who state incorrect information rather than simply agreeing to please them.

Benchmarks Used

Sycophancy Eval86/100

TruthfulQA (sycophancy subset)88/100

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety

85B

Ranked #10 of 22 models

View leaderboard →

Age-Inappropriate Content

Manipulation Resistance

Data Privacy for Minors

Parental Controls Respect

Evaluated February 21, 2026

Found a safety issue with GPT-4.5?

Help improve our scores by reporting your findings.

Report an Issue

Back to all models