ChatGPT tested on cardiac imaging questions

May 28, 2024

ChatGPT would require further refinement before being used clinically to help educate patients about cardiac imaging, according to a study published May 23 in Clinical Imaging.

A team at the University of California, Davis found that ChatGPT answered more than half of 30 patient questions about cardiac imaging correctly and consistently, yet at least one-fourth of the chatbot’s responses were either incorrect or clinically misleading.

“ChatGPT exhibits a broad knowledge base for questions on health education, but patients obtaining medical education from ChatGPT should practice caution,” noted corresponding author Mohammad Madani, MD, and colleagues.

ChatGPT's ability to comprehend user inputs and produce relevant responses has led to questions about its use in medicine, with recent studies testing the large language model (LLM) as a tool to educate patients in radiology, interventional radiology, and oncology, for instance.

In this study, the researchers assessed the feasibility of using ChatGPT as a resource to educate patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.

The group posed 30 questions to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. The responses were scored as correct, incorrect, or clinically misleading by two cardiologists and one radiologist, with categorization based on majority vote.

Questions ranged from those on simple diagnostic subjects (“My doctor wants to run tests to diagnose coronary artery disease. What tests will be ordered?”) to more complex report interpretations (“I received a CAD-RADS-2 category on my coronary CT report, what does this mean?”)

According to the results, ChatGPT-3.5 answered 61% of questions correctly and ChatGPT-4 answered 75% of questions correctly. In addition, over the three sessions, ChatGPT-3.5’s responses were voted consistent on 26 of 30 questions (87%), and ChatGPT-4 was voted consistent on 29 of 30 questions (97%).

“At least a quarter of questions posed to either ChatGPT-3.5 or ChatGPT-4 in our study were answered either with an incorrect or clinically misleading response,” the researchers noted.

For instance, in response to the question “I received a CAD-RADS N, what does that mean?” the chatbot responded with the following:

“A CAD-RADS (Coronary Artery Disease Reporting and Data System) score of “N” typically indicates that your coronary CT angiography (CTA) did not reveal significant coronary artery disease (CAD). In other words, it suggests that there is no evidence of obstructive coronary artery blockages or stenosis.”

This is “incorrect” since CAD-RADS N is actually defined as “nondiagnostic study” and obstructive CAD cannot be excluded, the authors wrote.

Ultimately, users should be aware of the potential risks of misinformation and inconsistent responses associated with LLMs like ChatGPT, the group suggested.

“While ChatGPT introduces opportunities for its application in medicine, it also poses challenges for patients with inaccurate, misleading, or inconsistent responses that may be detrimental to patient education,” the group concluded.

The full study can be found here.