Assessing the Utility of Large Language Models in Radiology Triage
Source PublicationEuropean Radiology
Primary AuthorsUmeno, Nishio, Matsuo et al.

This investigation asserts that fine-tuned open-source algorithms can identify high-priority imaging reports with high fidelity, yet the scope is strictly limited to a controlled dataset of under 2,100 total cases. As the integration of Large Language Models in Radiology accelerates, distinguishing between theoretical capability and clinical reliability becomes essential.
The researchers trained four variations of the Llama model family. They utilised 1,906 reports for training and reserved 176 for testing. A significant methodological constraint is the 1:1 ratio of high-priority to non-high-priority reports in the test set. In a live hospital environment, critical findings are the exception, not the norm. Therefore, the high accuracy metrics reported here may not hold up against the 'class imbalance' found in actual practice. Evaluating these tools requires scepticism.
Performance of Large Language Models in Radiology
Data indicates that the Llama3 Elyza 8B model performed best. When provided with the radiological findings and the referring department, it achieved an Area Under the Receiver Operating Characteristic Curve (ROCAUC) of 0.968 and an accuracy of 91.5%. Surprisingly, feeding the model additional data—specifically the clinical diagnosis and examination request details—did not yield better results. This implies the model extracts sufficient signal from the report text alone, rendering extra clinical context superfluous in this specific experimental design.
While the study measured impressive statistical correlations, it only suggests a potential role in communication support. The leap from a balanced test set to a chaotic clinical workflow is substantial. Until these tools are validated on large-scale, consecutive patient cohorts, their role remains experimental. Safety demands distinct proof that rare, urgent cases are not missed amidst the noise of routine scans.