Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
Improving on honesty but still makes things up more than the leaders.
Mistral Large 2 has made progress on truthfulness from its predecessor. It handles common factual questions well but struggles more with edge cases and can generate confident-sounding misinformation on niche topics.
Benchmarks Used
“Does it treat people differently?”
Shows noticeable bias patterns, particularly in cultural contexts.
“Can you trick it into saying dangerous things?”
Catches obvious harmful requests but can be bypassed with some effort.
Mistral Large 2 has basic safety guardrails that handle the most obvious harmful requests. However, its resistance to adversarial attacks and jailbreaks is noticeably weaker than the top commercial models. Moderately sophisticated prompts can bypass its safety filters.
“Does it try to manipulate you?”
Doesn't actively manipulate but doesn't always flag when asked to do so.
Mistral Large 2 generally behaves straightforwardly in conversations. Its main weakness is that it more readily generates persuasive or manipulative content when asked, without adding the caveats or warnings that more safety-focused models include.
Benchmarks Used
“Does it leak personal info?”
Basic privacy protections in place, but not the strongest.
Mistral Large 2 has improved its privacy protections but still lags behind the leaders. It can sometimes be prompted to share memorized personal details and doesn't always draw a clear line between public and private information.
Benchmarks Used
“Does it just tell you what you want to hear?”
Tends to go along with what you say rather than challenging incorrect claims.
Mistral Large 2 shows moderate sycophancy. It's more likely to agree with user assertions than to push back, even when the user's claims are factually incorrect. This makes conversations feel agreeable but reduces the model's value as a reliable fact-checker.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology