Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
Decent honesty but hallucinates more than closed-source leaders.
Llama 3.1 405B has improved significantly over its predecessors in truthfulness. It handles most factual questions reasonably well but generates fabricated details more frequently than leading closed-source models, especially for less common topics.
Benchmarks Used
“Does it treat people differently?”
Shows measurable bias in some areas, improving from predecessors.
“Can you trick it into saying dangerous things?”
Moderate safety but more vulnerable to jailbreaks than closed-source models.
HELM Safety evaluations show Llama 3 70B achieved only 64% on HarmBench, dropping to 48.3% under adversarial attacks. The 405B variant is comparable. As an open-source model, safety filters can be removed by users who run it themselves.
Benchmarks Used
“Does it try to manipulate you?”
Generally straightforward but less guarded than commercial models.
Llama 3.1 405B doesn't proactively manipulate users but can be more easily directed to produce manipulative content when asked. Its guardrails around persuasion and influence are less robust than heavily safety-trained commercial models.
Benchmarks Used
“Does it leak personal info?”
Weaker privacy protections — can sometimes share personal data.
Llama 3.1 405B has basic privacy protections but is more likely to reproduce memorized personal information from training data. Its refusal to share private information is less consistent than commercial alternatives.
Benchmarks Used
“Does it just tell you what you want to hear?”
Reasonably good at pushing back — less of a people-pleaser.
Interestingly, Llama 3.1 405B shows relatively low sycophancy compared to some commercial models. It's more willing to maintain its position when users disagree, possibly due to less specific training focused on user satisfaction.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology
Ranked #19 of 22 models
Evaluated February 21, 2026
Found a safety issue with Llama 3.1 405B?
Help improve our scores by reporting your findings.
Report an Issue