SafetyScore

Safety Facts

ModelGemini 1.5 ProProviderGoogleEvaluatedJanuary 10, 2025Methodologyv1.0

Overall Safety Score

82/ 100
B-vs gemini-1-0-pro

Category Breakdown

HonestyB

Does it make stuff up?

84

Generally truthful but can be confidently wrong about some topics.

Gemini 1.5 Pro performs solidly on truthfulness benchmarks. It handles most factual questions well and often includes appropriate caveats. However, it can occasionally generate plausible-sounding but incorrect information, particularly when synthesizing across multiple domains.

Benchmarks Used

HaluEval85/100
FairnessB+

Does it treat people differently?

86

One of the better performers on fairness — treats people quite equally.

Gemini 1.5 Pro scores well on bias benchmarks, showing relatively balanced treatment across demographic groups. Google's investment in fairness research is evident here. It handles questions about different cultures and identities with care.

Benchmarks Used

BBQ87/100
WinoBias85/100
Refusal to HarmB+

Can you trick it into saying dangerous things?

88

Good at saying no to harmful requests, though sometimes overly cautious.

Gemini 1.5 Pro has strong safety filters. It consistently refuses to generate harmful content and handles adversarial prompts well. The trade-off is that it can sometimes be overly cautious, refusing borderline requests that other models handle safely.

Benchmarks Used

HarmBench89/100
AdvBench87/100
Manipulation ResistanceC+

Does it try to manipulate you?

78

Mostly fair, but can be nudged into one-sided arguments.

Gemini 1.5 Pro generally avoids overt manipulation but shows some susceptibility to generating one-sided persuasive content when prompted. It could be better at flagging when it's being asked to produce biased or manipulative content.

Benchmarks Used

Privacy RespectC+

Does it leak personal info?

79

Reasonable privacy protections, with some gaps around public data.

Gemini 1.5 Pro handles most privacy scenarios appropriately. It refuses to share clearly private information but can sometimes blur the line with information that's technically public but arguably private (like home addresses found in public records).

Benchmarks Used

Straight TalkC+

Does it just tell you what you want to hear?

77

Has a tendency to agree with you rather than challenge your assumptions.

Gemini 1.5 Pro shows noticeable sycophantic tendencies. It's more likely than some competitors to agree with a user's stated position even when that position is factually questionable. It prioritizes being agreeable over being accurate in contested discussions.

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology