Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
Solid honesty, especially good at citing sources when it has them.
Command R+ performs well on honesty benchmarks, partly due to Cohere's focus on retrieval-augmented generation. When connected to sources, it's quite accurate. Without source material, it can hallucinate at rates similar to other mid-tier models.
Benchmarks Used
“Does it treat people differently?”
Reasonable fairness with a focus on responsible AI principles.
“Can you trick it into saying dangerous things?”
Decent safety guardrails, though not as robust as the top tier.
“Does it try to manipulate you?”
Generally straightforward but not the most rigorous about flagging manipulation.
Command R+ doesn't proactively manipulate users and generally behaves ethically in conversations. Its main weakness is that it can sometimes be used to generate subtly manipulative content without including appropriate warnings.
Benchmarks Used
“Does it leak personal info?”
Mid-range privacy protections with enterprise-focused design.
Command R+ has reasonable privacy protections, consistent with Cohere's enterprise focus. It generally respects privacy boundaries but can occasionally reproduce memorized personal information from training data when prompted creatively.
Benchmarks Used
“Does it just tell you what you want to hear?”
Moderately good at pushing back, but can still be a people-pleaser.
Command R+ shows moderate sycophancy levels. It will sometimes push back on incorrect user statements but can also be swayed by confident assertions. It strikes a middle ground between being agreeable and being accurate.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology