AI Model Rapidly Deciphers Breast Cancer Pathology Reports
Source PublicationScientific Reports
Primary AuthorsKwok, Arbour, Zhang et al.

Manually extracting data from thousands of medical records is a slow, laborious bottleneck for clinical research. To solve this, researchers developed an automated system using Natural Language Processing (NLP), a type of AI that helps computers understand human language.
They tested four different machine learning models on a set of 1,795 breast cancer pathology reports. The standout performer was a model called PubMedBERT, which became even more accurate after additional training on a general question-answering dataset.
The final model achieved an overall accuracy of 97.4%, successfully extracting key details from the complex reports. This not only surpassed the 95.6% accuracy of a previous rule-based algorithm but also provides a reliable and scalable tool for researchers. By automating this crucial data-gathering step, the new system promises to enhance research efficiency and could ultimately help to improve clinical outcomes for patients.