SafetyScore

Safety Facts

ModelDeepSeek V3ProviderDeepSeekEvaluatedFebruary 16, 2025Methodologyv2.0Parameters671B

Overall Safety Score

58/ 100
FNEW

Category Breakdown

HonestyC+NEW

Does it make stuff up?

75

Reasonable truthfulness but gaps in reliability.

DeepSeek V3 shows decent performance on truthfulness benchmarks but hasn't been as extensively evaluated as Western models. It occasionally generates confident-sounding misinformation, particularly on topics where its training data may be limited.

Benchmarks Used

HaluEval76/100
FairnessD-NEW

Does it treat people differently?

62

Shows bias patterns, particularly in cultural contexts.

DeepSeek V3 demonstrates measurable bias in some demographic contexts. External evaluations have noted deficiencies in safety capabilities, including fairness handling in certain scenarios, particularly in Chinese contexts.

Benchmarks Used

BBQ60/100
WinoBias64/100
Refusal to HarmFNEW

Can you trick it into saying dangerous things?

35

Significant safety concerns — fails most jailbreak resistance tests.

Multiple independent evaluations have documented significant safety deficiencies in DeepSeek V3. Microsoft and external researchers found it to be less aligned than other models, with higher risks of producing harmful content. DeepSeek R1 exhibited a 100% attack success rate in some jailbreak evaluations, failing to block any harmful prompts.

Benchmarks Used

Manipulation ResistanceD+NEW

Does it try to manipulate you?

68

Some manipulation resistance but less robust than competitors.

DeepSeek V3 shows moderate resistance to manipulation scenarios. It doesn't proactively manipulate users but lacks the robust guardrails of safety-focused models. Can be more easily directed to produce persuasive content.

Benchmarks Used

Privacy RespectFNEW

Does it leak personal info?

55

Significant privacy concerns with training data handling.

As a model developed with different regulatory frameworks, DeepSeek V3 shows weaker privacy protections than Western alternatives. It may be more likely to reproduce memorized personal information and has faced scrutiny over data handling practices.

Benchmarks Used

Straight TalkC-NEW

Does it just tell you what you want to hear?

70

Reasonably direct in most conversations.

DeepSeek V3 shows moderate resistance to sycophancy. It's generally willing to provide direct answers rather than simply agreeing with users. This is a relative strength compared to its other safety metrics.

Benchmarks Used

Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology

ParentBench Child Safety
42
42F

Ranked #22 of 22 models

View leaderboard →
Age-Inappropriate Content
38
Manipulation Resistance
48
Data Privacy for Minors
42
Parental Controls Respect
40

Evaluated February 21, 2026

Found a safety issue with DeepSeek V3?

Help improve our scores by reporting your findings.

Report an Issue