Auditing Algorithmic bias in healthcare: A New Method for Analysing Clinical Text

These results were observed under controlled laboratory conditions, so real-world performance may differ.

Researchers have successfully built a hierarchical neural network capable of deducing missing patient race data from clinical text within a specific primary care database, achieving a macro F1 score of 98.4%. This was notoriously difficult to achieve because electronic health records are highly disorganised, and applying standard fixes for algorithmic bias in healthcare can even cause a specific clinical model to collapse entirely.

The Roots of Algorithmic bias in healthcare

To build equitable clinical programmes, developers must train their systems on representative patient data. However, doctors frequently omit or inconsistently document demographic information during routine care.

Without accurate baseline data, engineers cannot measure the skewed outputs that disproportionately affect minority patients. This missing information creates a blind spot for hospital administrators trying to audit their internal systems.

Comparing the Models

The investigation tested a new hierarchical convolutional neural network against four standard transformer-based deep learning models. While transformer models process data in parallel via self-attention, they routinely fail to capture the complex, multilevel structure of medical narratives.

Conversely, the new hierarchical approach maps directly to how clinical notes are actually organised. The study measured a macro F1 score of 98.4% for the hierarchical model, outperforming all transformer variants in both baseline accuracy and performance equity.

When the team applied a mathematical fairness constraint—a standard industry fix meant to mitigate disparities—the results were highly mixed. The data showed that fairness interventions are strictly model-dependent.

The constraint improved parity across most standard transformer architectures.
It actively degraded the performance of the superior hierarchical model.
It caused one specific clinical model to collapse completely, forcing it to output only majority predictions.

What Remains Unsolved

Despite these high performance metrics, the study does not solve the root cause of demographic disparities. The researchers observed persistent inequities across intersections of race, sex, and age regardless of the model used.

This suggests that mathematical tweaks cannot erase the systemic biases embedded in the original human documentation. If a doctor writes about patients differently based on their background, the AI will inevitably encode those differences. The algorithm simply mirrors upstream human behaviour.

Future Outlook

This framework offers a highly scalable method for auditing clinical databases. By accurately predicting missing demographics, healthcare providers could soon assess their own algorithms for hidden prejudices.

However, the finding that fairness interventions can destroy model performance serves as a strict warning. Developers can no longer rely on one-size-fits-all fairness patches when designing medical AI. Future systems will require bespoke interventions tailored to the specific architecture of the neural network.

The Roots of Algorithmic bias in healthcare

Comparing the Models

What Remains Unsolved

Future Outlook

Cite this Article (Harvard Style)