CMIMI: Rules-based approach improves ChatGPT's protocoling performance

BOSTON -- Specific instructions for ChatGPT-4 can improve the large language model’s performance in protocol prediction accuracy in radiology, suggest findings presented at the Conference on Machine Intelligence in Medical Imaging (CMIMI).

In a poster presentation at the Society for Imaging Informatics in Medicine's (SIIM)-hosted gathering, a team led by Kartik Gupta from Western University in London, Ontario, Canada, found that a rules-based approach and custom prompting improved ChatGPT-4’s protocoling performance for chest CT thorax imaging.

“Radiologists want to spend more time doing imaging tasks than non-imaging tasks,” Gupta told AuntMinnie.com. “We wanted to find a way to maybe validate large language models to see if we can automate this process instead of using traditional natural language processing techniques.”

Imaging volumes for CT and MRI exams have increased over the last decade and this trend isn’t expected to stop anytime soon. This means that radiologists will have to work more on protocol, which can be a time-consuming task.

While researchers continue to explore effective ways to automate protocoling with AI tools, large language models in recent years have emerged as a potential choice. Gupta said that the ability of these models to answer medical questions and perform protocoling tasks could prove useful for radiologists.

Gupta and colleagues put this to the test with one such large language model, ChatGPT-4. They used a dataset of 796 labeled chest CT thorax imaging requests from Victoria Hospital in London, Ontario. The team identified four types of protocols: with contrast, without contrast, interstitial, and low-dose contrast. From there, it tested four different prompts with ChatGPT-4: control, classification rules, ablated, and refined.

Of the total imaging requests, 605 exams involved contrast media, 97 were without contrast, 27 were interstitial, and 67 were low-dose CT requests.

The researchers found that the rules-based approach led to significant improvement in protocoling performance.

Performance of ChatGPT-4 with rules-based approach for chest CT exams
Accuracy Precision Recall F1 Score p-value vs. control p-value vs. classification rules
Control 0.79 0.61 0.79 0.66 N/A
Classification rules 0.88 0.77 0.89 0.82 < 0.001
Classification rules-ablation 0.85 0.73 0.77 0.7 < 0.001 0.002
Classification rules-enhanced 0.9 0.8 0.89 0.84 < 0.001 < 0.001

Gupta said that large language models allow for zero-shot protocol prediction, which does not require large datasets and improves radiology workflow. He added that this approach can improve protocoling efficiency in radiology departments.

“If we can provide a rules-based system to different institutions, they can look at their own recs, look at the nuances of the requisitions from the doctors at their hospitals, and maybe create their own rules set and form very quick validation across protocols,” he told AuntMinnie.com.

Next steps for the researchers include validating and testing more protocols from other sites and doing so with larger datasets, Gupta added.

Page 1 of 1