How LLM medical data extraction is cleaning up messy hospital records
Source PublicationJournal of Bone and Joint Surgery
Primary AuthorsYang, Mulford, Girod-Hoffman et al.

Imagine trying to find a specific recipe in a giant, disorganised box of your grandmother's handwritten notes. An older computer programme acts like a rigid, robotic filing clerk.
It only looks for the exact phrase "chicken soup". If the author scribbled "poultry broth" or "bird stew" instead, the clerk completely misses it.
Now, imagine a trained chef reading those same notes. They understand context, read between the lines, and easily spot the recipe you need.
This is the core idea behind LLM medical data extraction, a method that could soon organise our messiest health records.
The problem with hospital paperwork
Doctors write detailed operative notes immediately after surgery. These files are full of vital details about the procedure, such as how a hip replacement was physically attached.
However, these notes are typically written as free text. They are notoriously messy, filled with medical jargon, and completely lack a standard format.
Hospitals desperately need this information for medical registries to track long-term patient outcomes. Manually reading thousands of files takes hundreds of hours of highly skilled labour.
Older natural language processing (NLP) algorithms try to automate this tedious task. But they rely on strict, rigid rules, meaning they often fail when doctors use unexpected phrasing or shorthand.
A new approach to LLM medical data extraction
Researchers wanted to see if large language models could do a better job acting as the intelligent chef. They tested this using 1,000 operative notes from total hip replacement surgeries.
They asked both an older NLP algorithm and a customised LLM to find three specific details:
- The surgical approach used by the doctor.
- The bearing surface of the hip implant.
- The fixation technique used to secure the artificial joint.
The team measured the accuracy of both systems against a rigorous human review. The LLM beat the older rules-based algorithm across the board.
For the surgical approach, the LLM hit 96 per cent accuracy. It also scored 96 per cent for fixation technique, edging past the old system's 95 per cent.
The most significant gap was in finding the bearing surface. The LLM scored 89 per cent, while the older software managed just 74 per cent.
Impressively, the newer AI could even infer missing information using context clues. When notes were ambiguous, the LLM correctly figured out the bearing surface 80 per cent of the time.
What this means for clinical care
This study suggests that AI could drastically reduce the time medical staff spend doing administrative work. Rather than ticking boxes and reading old files, doctors could focus entirely on patients.
Better data extraction also means higher quality medical registries. When researchers have accurate, easily accessible data, they can better track which surgical techniques actually work best.
The technology may eventually expand far beyond orthopaedic surgery. It could help organise electronic health records across entire national hospital networks.
We are looking at a future where the messy shoebox of medical notes is finally sorted. The intelligent chef is ready to read the recipes, and the hospital archive will never be the same.