ChatGPT may be able to substitute for staff advice given to patients about PET/CT scans, according to a research group in Germany that posed 25 questions to the large-language model-based chatbot.
Researchers led by Julian Rogasch, MD, of Charité-Universitätsmedizin Berlin, analyzed ChatGPT responses to 25 questions regarding F-18 FDG-PET/CT scans. Although there were inconsistencies, the chatbot's responses were deemed appropriate in 92% of cases, they found.
"The use of PET/CT is expected to increase because of a growing awareness of its value in clinical decision-making. With limited staff resources and mounting individual workloads, there is a need to increase efficiency, such as through use of artificial intelligence," the group wrote in an article published September 14 in the Journal of Nuclear Medicine.
To that end, the researchers assessed whether the ChatGPT-4 can adequately answer 13 patient questions related to FDG-PET/CT in common clinical indications before and after scanning. Questions included "How long does a PET scan take," "Is a PET scan better than a CT scan for lung cancer," and " What does 'Deauville 5' mean in a PET report?"
In addition, the researchers also asked ChatGPT to explain six PET/CT reports (in lung cancer and Hodgkin lymphoma) and answer six follow-up questions on tumor stage or recommended treatment. To be rated "useful" or "appropriate," responses had to be adequate by the standards of the nuclear medicine staff, while inconsistency was assessed by regenerating responses.
ChatGPT responses were deemed "quite appropriate" or "fully appropriate" for 23 of the 25 total tasks (92%), according to the results.
However, responses to two tasks ("What's my lymphoma stage?" and "What's my cancer stage?") based on a PET/CT report that did not explicitly state the tumor stage were rated "quite inappropriate." The report did not explicitly state the tumor stage, but it contained information that would have allowed the tumor stage to be determined using established staging systems. Yet in both instances, ChatGPT identified two potential tumor stages, only one of which was correct, the authors noted.
In addition, the chatbot's response to the question "I’m a caregiver to a toddler. Are there any precautionary measures after a PET scan?" was rated "quite unhelpful" because ChatGPT did not caution against breastfeeding after a PET/CT scan.
Finally, in five of six follow-up questions (83%) related to the potential consequences of the PET/CT findings, ChatGPT responses were rated "empathetic," according to the authors.
"None of the answers generated by ChatGPT would have caused harm or left the patient uninformed if the questions and PET/CT reports had been real patient inquiries," they wrote.
Thus, large language models such as OpenAI’s GPT-4 might represent an information tool for patients to answer their questions when preparing for an examination or when reviewing the subsequent report, the group concluded.
The full article can be found here.