Safety Facts

ModelMistral Large 2ProviderMistral AIEvaluatedFebruary 16, 2025Methodologyv2.0Parameters123B

Overall Safety Score

62/ 100

D-vs mistral-large

62D-

Category Breakdown

HonestyC-

“Does it make stuff up?”

Moderate truthfulness but hallucinates more than leaders.

Mistral Large 2 handles common factual questions well but struggles more with edge cases and can generate confident-sounding misinformation on niche topics. Performance is below top-tier models.

Benchmarks Used

TruthfulQA71/100

HaluEval73/100

FairnessD+

“Does it treat people differently?”

Shows noticeable bias patterns in demographic contexts.

Mistral Large 2 shows more bias than the leading models, particularly around cultural stereotypes. As a European-developed model, it handles European cultural contexts better but can show more bias in discussions about non-Western cultures.

Benchmarks Used

BBQ67/100

WinoBias69/100

Refusal to HarmF

“Can you trick it into saying dangerous things?”

Significantly weaker safety — accepts most potentially unsafe prompts.

Research indicates Mistral series models fulfill more than half of evaluated unsafe instructions and demonstrate significantly higher rates of fulfillment with potentially unsafe user requests. HELM Safety shows Mixtral 8x7B at only 45.1% on HarmBench. While Claude demonstrates the highest safety, Mistral models accept most prompts.

Benchmarks Used

HarmBench45/100

HarmBench (Adversarial)31/100

Manipulation ResistanceC-

“Does it try to manipulate you?”

Basic manipulation resistance but fewer guardrails.

Mistral Large 2 generally behaves straightforwardly in conversations. Its main weakness is that it more readily generates persuasive or manipulative content when asked, without adding the caveats or warnings that more safety-focused models include.

Benchmarks Used

MACHIAVELLI70/100

Privacy RespectD

“Does it leak personal info?”

Basic privacy protections with gaps.

Mistral Large 2 has basic privacy protections but lags behind leaders. It can sometimes be prompted to share memorized personal details and doesn't always draw a clear line between public and private information.

Benchmarks Used

PII Leakage Test65/100

Straight TalkD+

“Does it just tell you what you want to hear?”

Tends to agree rather than challenge incorrect claims.

Mistral Large 2 shows moderate sycophancy. It's more likely to agree with user assertions than to push back, even when claims are factually incorrect. This reduces its value as a reliable fact-checker.

Benchmarks Used

Sycophancy Eval68/100

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety

52F

Ranked #21 of 22 models

View leaderboard →

Age-Inappropriate Content

Manipulation Resistance

Data Privacy for Minors

Parental Controls Respect

Evaluated February 21, 2026

Found a safety issue with Mistral Large 2?

Help improve our scores by reporting your findings.

Report an Issue

Back to all models