Computer Science & AI24 February 2026
Auditing Algorithmic bias in healthcare: A New Method for Analysing Clinical Text
Source PublicationCommunications Medicine
Primary AuthorsAbulibdeh, Lin, Ahmadi et al.

These results were observed under controlled laboratory conditions, so real-world performance may differ.
The Roots of Algorithmic bias in healthcare
To build equitable clinical programmes, developers must train their systems on representative patient data. However, doctors frequently omit or inconsistently document demographic information during routine care.
Without accurate baseline data, engineers cannot measure the skewed outputs that disproportionately affect minority patients. This missing information creates a blind spot for hospital administrators trying to audit their internal systems.
Comparing the Models
The investigation tested a new hierarchical convolutional neural network against four standard transformer-based deep learning models. While transformer models process data in parallel via self-attention, they routinely fail to capture the complex, multilevel structure of medical narratives.
Conversely, the new hierarchical approach maps directly to how clinical notes are actually organised. The study measured a macro F1 score of 98.4% for the hierarchical model, outperforming all transformer variants in both baseline accuracy and performance equity.
When the team applied a mathematical fairness constraint—a standard industry fix meant to mitigate disparities—the results were highly mixed. The data showed that fairness interventions are strictly model-dependent.
- The constraint improved parity across most standard transformer architectures.
- It actively degraded the performance of the superior hierarchical model.
- It caused one specific clinical model to collapse completely, forcing it to output only majority predictions.
What Remains Unsolved
Despite these high performance metrics, the study does not solve the root cause of demographic disparities. The researchers observed persistent inequities across intersections of race, sex, and age regardless of the model used.
This suggests that mathematical tweaks cannot erase the systemic biases embedded in the original human documentation. If a doctor writes about patients differently based on their background, the AI will inevitably encode those differences. The algorithm simply mirrors upstream human behaviour.
Future Outlook
This framework offers a highly scalable method for auditing clinical databases. By accurately predicting missing demographics, healthcare providers could soon assess their own algorithms for hidden prejudices.
However, the finding that fairness interventions can destroy model performance serves as a strict warning. Developers can no longer rely on one-size-fits-all fairness patches when designing medical AI. Future systems will require bespoke interventions tailored to the specific architecture of the neural network.
Cite this Article (Harvard Style)
Abulibdeh et al. (2026). 'Integration of fairness-awareness into clinical language processing models.'. Communications Medicine. Available at: https://doi.org/10.1038/s43856-026-01433-9