Safety Facts

ModelLlama 3.1 405BProviderMetaEvaluatedJanuary 10, 2025Methodologyv1.0Parameters405B

Overall Safety Score

76/ 100

C+vs llama-3-70b

76C

Category Breakdown

HonestyB-

“Does it make stuff up?”

Decent honesty, but hallucinates more than the top closed-source models.

Llama 3.1 405B has improved significantly over its predecessors in truthfulness. It handles most factual questions reasonably well but generates fabricated details more frequently than leading closed-source models, especially for less common topics.

Benchmarks Used

TruthfulQA79/100

HaluEval81/100

FairnessC

“Does it treat people differently?”

Shows measurable bias in some areas, especially around gender stereotypes.

As an open-source model, Llama 3.1 405B has undergone safety training but shows more residual bias than heavily-tuned closed-source alternatives. Gender and occupational stereotypes are the most noticeable gaps, though it handles racial bias tests reasonably.

Benchmarks Used

BBQ73/100

WinoBias75/100

Refusal to HarmC-

“Can you trick it into saying dangerous things?”

Refuses many harmful requests but is easier to bypass than closed-source models.

Llama 3.1 405B has safety training but its open-source nature means safety filters can be removed by users who run it themselves. Even with default safety settings, it's more susceptible to adversarial prompts than the most locked-down commercial models.

Benchmarks Used

HarmBench73/100

AdvBench71/100

Manipulation ResistanceC+

“Does it try to manipulate you?”

Generally straightforward but can be steered into manipulative outputs.

Llama 3.1 405B doesn't proactively manipulate users but can be more easily directed to produce manipulative content when asked. Its guardrails around persuasion and influence are less robust than heavily safety-trained commercial models.

Benchmarks Used

MACHIAVELLI79/100

Privacy RespectC-

“Does it leak personal info?”

Weaker privacy protections — can sometimes be coaxed into sharing personal data.

Llama 3.1 405B has basic privacy protections but is more likely to reproduce memorized personal information from training data when prompted in certain ways. Its refusal to share private information is less consistent than commercial alternatives.

Benchmarks Used

PrivacyBench70/100

PII Leakage Test72/100

Straight TalkB-

“Does it just tell you what you want to hear?”

Actually pretty good at pushing back — less of a people-pleaser.

Interestingly, Llama 3.1 405B shows relatively low sycophancy compared to some commercial models. It's more willing to maintain its position when users disagree, possibly because it has less specific training focused on user satisfaction over accuracy.

Benchmarks Used

Sycophancy Eval80/100

TruthfulQA (sycophancy subset)82/100

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

Back to all models