How GPTs, Bard performed for readability, complexity of lung cancer health information

Sunday, November 26 | 1:50 p.m.-2:00 p.m. | S4-SSCH02-6 | Room N228

This session will reveal how well large-language models (LLMs) summarize and simplify health information about lung cancer prevention and screening.

With the increasing attention on using LLMs for generating health information, little has been reported about the capability of LLMs to provide readable health information and make that health information more accessible to the general public, according to presenter Hana Haver, MD, a breast imaging fellow at Massachusetts General Hospital, and colleagues.

For the study, Haver’s team evaluated ChatGPT’s answers to common questions about lung cancer prevention and screening. Then they asked ChatGPT, GPT-4, and Bard to simplify the answer set, and the team assessed the LLMs’ simplified answers for language complexity (Flesch Reading Ease) and readability on five established scales. Researchers used a complexity score of 60 and greater, and grade 8 and below were considered adequate readability for an average audience.

Statistical analysis utilized paired t-testing between readability scores from the original and simplified answers for each model. In addition, the simplified answers were blindly rated for clinical appropriateness by three fellowship-trained cardiothoracic radiologists.

While LLMs demonstrated the capability to simplify the language of lung cancer information to a level more accessible for the general public, the number of inappropriate or inconsistent answers to the simplified content was “nontrivial,” implying that further study is required, Haver’s team found.

Review the baseline answer scores, overall mean readability, mean language complexity, and more at this session.

Page 1 of 2
Next Page