ChatGPT has the edge over Google Bard when it comes to correctly answering lung cancer questions, according to research published June 13 in Radiology.
A team led by Amir Rahsepar, MD, from the University of California, Los Angeles found that ChatGPT was 1.5 times more likely to give correct or partially correct answers than Google Bard, and it demonstrated more consistency. Still, neither tool achieved 100% accuracy.
"Although the use of [AI] offers new possibilities in fields like medicine, it also presents challenges that must be carefully reviewed by experts to prevent undue burden on patients and healthcare workers," Rahsepar and colleagues wrote.
Large-language models have become popular for a variety of uses, including in medicine, and radiologists are exploring the pros and cons of using them. OpenAI released ChatGPT in November 2022, while Google released Google Bard in March 2023.
Rahsepar's group investigated and compared the accuracy and consistency of responses to nonexpert questions on lung cancer generated by the two models. Questions included topics such as cancer prevention, screening, and radiology report terminology. The team based their assessment on Lung-RADS (version 2022) and Fleischner Society recommendations.
The researchers presented the same 40 questions three times for both models, as well as the search engines Bing and Google for a total of 120 questions answered. Two radiologists reviewed each answer for accuracy and scored them as correct, partially correct, incorrect, or unanswered. The radiologists also evaluated the consistency of the models.
The authors found that ChatGPT answered more questions correctly than Google Bard, Bing, and Google, although the Google search engine produced more partially correct answers than ChatGPT.
Comparison of answers given by AI models, search engines (out of 120 questions) | ||||
Answer type | Google search engine | Bing | Google Bard | ChatGPT |
Correct | 66 | 74 | 62 | 85 |
Partially correct | 27 | 13 | 11 | 14 |
Incorrect | 27 | 33 | 24 | 21 |
Unanswered | 0 | 0 | 23 | 0 |
The researchers reported that ChatGPT was 1.5 times more likely to provide a correct answer than Google Bard, with an odds ratio (OR) of 1.55 (p = 0.004). They also found that ChatGPT and the Google search engine were seven and 29 times more likely to be consistent than Google Bard (OR, 6.65 [p = 0.002] and 28.83 [p = 0.002], respectively).
The study authors suggested that the incorrect answers given by ChatGPT and Google Bard may be due to the models being trained on diverse internet content rather than solely scientific societies.
"It is important to note that ChatGPT's training only goes up until September 2021, whereas Google Bard benefits from using more recent and updated data," they noted. "As a result, ChatGPT may not be aware of recent guidelines from the American College of Radiology Lung-RADS v2022."
Neither large-language model could differentiate between Lung-RADS and Fleischner Society guidelines, the group wrote, and recommended that the models should be trained to refer to appropriate scientific society guidelines rather than general websites such as news articles or Wikipedia.
A diverse group of experts should be involved from the beginning of the process of fine tuning and retraining the models, the authors urged.
"These experts can provide valuable human feedback, as well as integrate the feedback and judgment of expert medical groups into the models' training data," they wrote.
The full study can be found here.