How ChatGPT, Bard ranked in appropriateness of lung cancer answers

Nov 16, 2023

Sunday, November 26 | 1:30 p.m.-1:40 p.m. | S4-SSCH02-4 | Room N228

While ChatGPT and Google Bard were both able to answer nonexpert questions about lung cancer prevention, screening, and terminology commonly used in radiology reports, ChatGPT won this battle of large-language models (LLMs), according to this study.

With two radiologists assisting for accuracy, 40 questions were created for comparing ChatGPT and Bard. Consistency among the answers and accuracy of the two LLMs were evaluated, with consistency defined as the agreement between the three answers provided by either ChatGPT or Bard, regardless of whether the concept conveyed was correct or incorrect.

In comparing the LLM model outputs, UCLA Health cardiothoracic imaging fellow Amir Ali Rahsepar, MD, and team found that ChatGPT’s answers were consistent 90% of the time or 36 out of 40 times, while Bard's answers were consistent only 57.5% of the time at 23 out of 40.

Out of 120 ChatGPT answers, 70.8% were correct (85), 11.7% were partially correct (14), and 17.5% were incorrect (21). In a breakdown of Bard’s performance answering 97 questions, 51.7% of Bard’s answers were correct (62), 9.2% were partially correct (11), and 20% were incorrect (24), according the Rahsepar's findings.

Although the use of AI offers new possibilities, according to Rahsepar, it also presents challenges that must be carefully reviewed by experts to prevent undue burden on patients and healthcare workers.

“It is essential that LLM developers be aware of the complexity of healthcare decision-making and implement serious guardrails for all healthcare-related interactions,” Rahsepar wrote.

Learn more at this Sunday afternoon session.

How ChatGPT, Bard ranked in appropriateness of lung cancer answers

Road to RSNA 2023: Imaging Informatics Preview

AI reduced CT brain reporting time in teleradiology

Clinically Meaningful AI Detection of Interval Cancers

Locally derived calibration sets prove useful for predicting AI performance

Road to RSNA 2023: Imaging Informatics Preview

AI reduced CT brain reporting time in teleradiology

Seeking trustworthy automated detection for hip implant

Locally derived calibration sets prove useful for predicting AI performance

Novel method aims to stop white-box, black-box attacks on AI diagnosis models

Are LLMs effective without specific training?

ChatGPT-4 delivers differential diagnosis on abdominal imaging

NLP tool helps make radiology reports more readable

AI model shows promise for spotting errors in CXR reports

NLP technique eases reporting requirements

Macro tool tracks incidental findings for follow-up

Machine-learning model prioritizes STAT imaging orders