CMIMI: ChatGPT 'reasonably' accurate for identifying chest x-ray terms

Oct 21, 2024

ChatGPT-4 is reasonably accurate in identifying radiological terms on chest x-ray reports, according to research presented at the Conference on Machine Intelligence in Medical Imaging (CMIMI).

A team led by Jad Alsheikh from Creighton University in Omaha, NE, also found that the large language model can account for variations in terminology, but its performance varies for less common terms. The group's results were highlighted in a poster presentation at the Society for Imaging Informatics in Medicine (SIIM)-hosted meeting.

“This study underscores the potential of GPT-4 in enhancing patient understanding of radiological reports by providing a reliable database of terms,” the Alsheikh team wrote.

Researchers continue to explore the potential of large language models such as ChatGPT in aiding with medical report generation, radiologists included. Successful automation of generating imaging reports could free up radiologists’ time that could be spent on imaging exams and patient consultation. Furthermore, some radiological reports may contain technical language that patients may not easily understand.

Alsheikh and colleagues evaluated ChatGPT-4’s accuracy in identifying common radiological terms from chest x-ray reports. They wanted to assess the model’s potential for creating a database for highlighting and defining medical terms.

The researchers generated two lists of the top 40 most common radiological chest x-ray findings and phrases. The first list contained data from 3,999 reports from the Open-i service of the National Library of Medicine. GPT-4 generated the second list, in which the chatbot identified what it believed to be the 40 most common phrases and findings.

The team reported that GPT-4 could accurately identify common terms. Terms with high exact match proportions when comparing the two lists included pleural effusion (100%), pulmonary edema (100%), and pneumothorax (73.7%).

The precision, recall, and F1 score amounted to 0.3, which indicates moderate overlap between terms identified by GPT-4 and the chest x-ray reports. Additionally, the team observed a Spearman’s rank correlation of 0.32, which suggests a weak correlation between the ranks of term frequencies between GPT-4’s list and the reports.

Finally, chi-squared analysis revealed that the differences between the frequencies of terms in actual reports and those identified by GPT-4 were statistically significant.

The study authors concluded that incorporating AI tools such as ChatGPT could boost patient communication and engagement in radiology. They added that this could contribute to better healthcare outcomes.

Check out AuntMinnie.com's coverage of CMIMI 2024 here.