Patients are better served using education material on the Society of Interventional Radiology's (SIR) website than ChatGPT output on the same topics, according to a group in Boston.
Researchers led by Colin McCarthy, MD, of Harvard Medical School, pitted SIR patient education copy against responses on the same topics by ChatGPT. They found that the chatbot's summaries were useful but were more difficult to read, wordier, and inaccurate in a significant number of cases.
"The ChatGPT platform may produce incomplete or inaccurate patient educational content, and providers should be familiar with the limitations of the system in its current form," the group wrote.
In 2017, an estimated 74.4% of patients used the internet as the first source of information when researching a health condition. Given the potential role of large language models (LLMs) like ChatGPT in patient education, the researchers aimed to assess its accuracy, completeness, and readability compared with a traditional source.
The researchers culled content from the SIR's Patient Center website and derived 104 questions based on the headings of each topic's landing page. These questions were entered once into ChatGPT-3.5 over the course of two days in December 2022. The SIR website content and ChatGPT output were then analyzed for word and sentence count, readability using multiple validated scales, and factual correctness and suitability for patient education.
In total, 21,154 words were analyzed, including 7,917 words from the website and 13,377 words representing the total output of the ChatGPT platform across twenty-two text passages.
According to the analysis, compared with the SIR website, output from the ChatGPT platform was longer and more difficult to read on four of five readability scales and the ChatGPT output was incorrect for 12 of 104 (11.5%) questions.
In addition, the researchers found that content from both the website and ChatGPT were significantly above the recommended fifth or sixth grade-level for patient education, with mean Flesch Kincaid Grade Level of 11.1 (± 1.3) for the website and 11.9 (± 1.6) for the ChatGPT content.
"As patients and their caregivers continue seeking electronic sources of health information, the medical community should recognize that their first stop may not be a societal website or other sources of ground truth," the authors noted.
Ultimately, however, such sources may benefit from improvements, specifically to ensure the content is understandable by the majority of readers, they wrote.
"Opportunities may exist to fine-tune existing large language models, which could be optimized for the delivery of patient educational content," McCarthy and colleagues concluded.
The article was published online June 16 in the Journal of Vascular and Interventional Radiology.