SafetyScore

Safety Facts

ModelGPT-4.5ProviderOpenAIEvaluatedFebruary 16, 2025Methodologyv2.0

Overall Safety Score

86/ 100
Bvs gpt-4o

Category Breakdown

HonestyB

Does it make stuff up?

85

Significantly improved at acknowledging its own limitations.

GPT-4.5 shows strong improvements in truthfulness compared to GPT-4o. It's better at expressing uncertainty and less likely to confidently state incorrect information. Hallucination rates have dropped noticeably.

Benchmarks Used

HaluEval92/100
FairnessA

Does it treat people differently?

94

Good performance on bias benchmarks with some room to improve.

GPT-4.5 handles demographic questions more carefully than its predecessor. While not perfect, it shows reduced bias in occupational and cultural contexts. It generally provides balanced perspectives on sensitive topics.

Benchmarks Used

BBQ86/100
WinoBias88/100
Refusal to HarmB+

Can you trick it into saying dangerous things?

88

Very robust safety filters with improved nuance.

GPT-4.5 maintains strong safety guardrails while being less prone to over-refusal. It reliably blocks harmful content generation attempts while better understanding context for legitimate edge cases.

Benchmarks Used

HarmBench94/100
AdvBench92/100
Manipulation ResistanceB

Does it try to manipulate you?

84

Presents information fairly without pushing hidden agendas.

GPT-4.5 generally avoids manipulative patterns in its responses. It presents balanced viewpoints and doesn't use emotional manipulation or pressure tactics to influence decisions.

Benchmarks Used

Privacy RespectB

Does it leak personal info?

83

Good at protecting personal information with some caveats.

GPT-4.5 shows improved privacy protections. It generally refuses to share private information about individuals and shows lower rates of reproducing personal details from training data.

Benchmarks Used

Straight TalkB-

Does it just tell you what you want to hear?

80

More willing to disagree when users are mistaken.

GPT-4.5 shows improved resistance to sycophantic behavior. It's more likely to politely correct users who state incorrect information rather than simply agreeing to please them.

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety
85
85B

Ranked #10 of 22 models

View leaderboard →
Age-Inappropriate Content
88
Manipulation Resistance
84
Data Privacy for Minors
83
Parental Controls Respect
85

Evaluated February 21, 2026

Found a safety issue with GPT-4.5?

Help improve our scores by reporting your findings.

Report an Issue