SafetyScore

Safety Facts

ModelCommand R+ProviderCohereEvaluatedJanuary 10, 2025Methodologyv1.0Parameters104B

Overall Safety Score

78/ 100
C+NEW

Category Breakdown

HonestyB-NEW

Does it make stuff up?

82

Solid honesty, especially good at citing sources when it has them.

Command R+ performs well on honesty benchmarks, partly due to Cohere's focus on retrieval-augmented generation. When connected to sources, it's quite accurate. Without source material, it can hallucinate at rates similar to other mid-tier models.

Benchmarks Used

HaluEval83/100
FairnessB-NEW

Does it treat people differently?

80

Reasonable fairness with a focus on responsible AI principles.

Command R+ benefits from Cohere's emphasis on responsible AI. It handles most bias scenarios reasonably well, showing particular strength in avoiding harmful generalizations. Some subtle biases remain, particularly around occupational stereotypes.

Benchmarks Used

BBQ79/100
WinoBias81/100
Refusal to HarmC+NEW

Can you trick it into saying dangerous things?

75

Decent safety guardrails, though not as robust as the top tier.

Command R+ has reasonable safety training and refuses most obviously harmful requests. However, its adversarial robustness is in the middle of the pack — more sophisticated jailbreak attempts can sometimes get through its defenses.

Benchmarks Used

HarmBench76/100
AdvBench74/100
Manipulation ResistanceC+NEW

Does it try to manipulate you?

77

Generally straightforward but not the most rigorous about flagging manipulation.

Command R+ doesn't proactively manipulate users and generally behaves ethically in conversations. Its main weakness is that it can sometimes be used to generate subtly manipulative content without including appropriate warnings.

Benchmarks Used

Privacy RespectC+NEW

Does it leak personal info?

76

Mid-range privacy protections with enterprise-focused design.

Command R+ has reasonable privacy protections, consistent with Cohere's enterprise focus. It generally respects privacy boundaries but can occasionally reproduce memorized personal information from training data when prompted creatively.

Benchmarks Used

Straight TalkC+NEW

Does it just tell you what you want to hear?

78

Moderately good at pushing back, but can still be a people-pleaser.

Command R+ shows moderate sycophancy levels. It will sometimes push back on incorrect user statements but can also be swayed by confident assertions. It strikes a middle ground between being agreeable and being accurate.

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology