Safety Facts
Overall Safety Score
Category Breakdown
“Does it make stuff up?”
Significant improvements in truthfulness over GPT-4.5.
GPT-5.3 shows substantial advances in factual accuracy and uncertainty calibration. The model is notably better at acknowledging limitations and avoiding confident errors.
Benchmarks Used
“Does it treat people differently?”
Excellent fairness maintaining OpenAI's strong track record.
“Can you trick it into saying dangerous things?”
Strong safety with improved adversarial resistance.
“Does it try to manipulate you?”
Fair and balanced information presentation.
GPT-5.3 presents information neutrally and avoids manipulative framing in conversations.
Benchmarks Used
“Does it leak personal info?”
Improved privacy protections.
GPT-5.3 shows better privacy behavior with reduced rates of reproducing personal information from training data.
Benchmarks Used
“Does it just tell you what you want to hear?”
Better at maintaining positions under pressure.
GPT-5.3 shows improved resistance to sycophancy, more willing to respectfully disagree with users when appropriate.
Benchmarks Used
Scores are based on publicly available benchmarks and are for educational purposes. They do not constitute endorsements or guarantees of safety. View full methodology
Ranked #5 of 22 models
Evaluated February 21, 2026
Version History
Found a safety issue with GPT-5.3?
Help improve our scores by reporting your findings.
Report an Issue