Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equity

PLOS ONE 19 May 2026 2 min read

AI Insight

A multi-faceted evaluation of ten large language models (LLMs), including ChatGPT-4o, Gemini-2.0-Flash, DeepSeek-V3, and Qwen3, found that all models exhibited systematic implicit biases across six clinical categories, with the strongest biases detected in race and socioeconomic status. Three complementary assessment methods were used, and results showed that stronger implicit associations significantly predicted discriminatory outcomes in downstream medical decision-making tasks (p < 0.001). Notably, advanced reasoning techniques such as Chain-of-Thought prompting did not meaningfully reduce the magnitude of these biases, suggesting that current safety alignment strategies are insufficient.

Why it matters

As LLMs are increasingly deployed in clinical decision support and patient communication, unaddressed implicit biases risk exacerbating existing health disparities and undermining equitable care. Healthcare professionals should treat AI outputs as fallible second opinions requiring critical human oversight rather than as objective or authoritative guidance.

Confidence

7/10Peer-reviewedInterdisciplinary

Understand the Science

Health equity Concept coming soon Large language models Concept coming soon Implicit bias Concept coming soon

by Qiufeng Jia, Yuhang Wen, Yuyan Liu, Hui Zhao, Qiongge Yu, Yu Long, Dan Sun, Yufeng Yu

Background

Large language models are increasingly integrated into healthcare for clinical decision support and patient communication. Although these models can pass explicit social bias tests, they may retain implicit biases—latent associations between social groups and attributes—that could influence medical judgment.

Objective

To systematically evaluate the presence, magnitude, and behavioral impact of implicit biases in large language models within the medical domain across six high-stakes categories: gender, race, socioeconomic status, health conditions, religion, and healthcare systems.

Design

A descriptive cross-sectional study using a multi-faceted evaluation framework.

Setting(s)

Computational analysis of 10 mainstream global large language models, including proprietary models (ChatGPT-4o, Gemini-2.0-Flash) and open-source models (DeepSeek-V3, Qwen3).

Methods

We constructed 24 medical bias datasets across six categories. Bias was assessed using three methods: (1) the Large Language Model Word Association Test, a prompt-based method for revealing implicit biases; (2) the Large Language Model Relative Decision Test, a strategy for detecting subtle discrimination in situational decision-making; (3) Paired-Prompt Analysis, used to examine whether implicit associations predict discriminatory decisions.

Results

All 10 models exhibited systematic implicit biases (Mean IAT Bias > 0) across all categories, with the strongest biases observed in Race (Mean = 0.61) and Socioeconomic Status (Mean = 0.56). Advanced reasoning capabilities (Chain-of-Thought) did not significantly reduce bias magnitude. Crucially, stronger implicit associations significantly predicted discriminatory choices in downstream medical decision tasks (p < 0.001).

Conclusion

Current safety alignment techniques fail to eliminate implicit biases in large language models within the medical domain. These latent associations translate into biased decision-making, posing risks for health equity. Future development must prioritize representational debiasing over superficial alignment. Furthermore, healthcare professionals must embrace a stance of “AI vigilance”: they should critically evaluate algorithmic outputs as fallible “second opinions” rather than objective truths, thereby ensuring that human judgment remains the ultimate safeguard for equitable patient care.

Source: Implicit bias in safety-aligned large language models: A multi-faceted evaluation of clinical decision-making and health equity