Most published articles testing ChatGPT in radiology show that the technology offers impressive performance, although there continue to be opportunities to improve it, according to a team in Cleveland.
A group led by Dr. Kaustav Bera of University Hospitals Cleveland Medical Center, studied the published literature on ChatGPT and radiology over nine months since its public release and detailed the scope of the work. The team's findings were published October 20 in Current Problems in Diagnostic Radiology.
“While mainstream news harbored on the coming AI apocalypse and AI doomerism at the forefront, with ChatGPT taking away everyone's jobs and being all-knowing and omniscient, academia was hard at work trying to harness its power in a myriad of different ways,” the group wrote.
Bera and colleagues performed a literature search and identified 51 articles published involving ChatGPT and radiology/imaging dating between January 26 to August 14. Twenty-three articles were original research, while the rest included reviews and perspectives or brief communications.
Most articles discussed the multifaceted nature of the tool and explored its potential impact in virtually every aspect of radiology, from patient education, preauthorization, protocol selection, generating differentials, to structuring radiology reports, the group wrote.
In addition, 10 different journals published these articles, with the impact factors for the journals ranging from 1.2 to 19.7. The overall publication trend indicated an increase in the number of articles as time progressed, peaking with 16 published in July.
For a quantitative analysis, the researchers included 23 original research and 17 non-original research articles and excluded 11 letters as responses to previous articles. They devised a scoring system based on multiple questions (“Is the question the article is answering important to radiologists?” and “How rigorous is the materials and methods?” for instance), with two experienced academic radiologists then evaluating the studies.
The mean score for original research was 3.20 out of 5 (across five questions), while the mean score for non-original research was 1.17 out of 2 (across six questions), according to the findings.
“In objective evaluation of the performance of the published articles, we noticed that most of the articles were high-quality, which is impressive considering that ChatGPT has only been around for nine months,” the researchers wrote.
They noted that, as radiology is an image-rich field, researchers and radiologists primarily had to determine ways in which ChatGPT could assist despite its inability to analyze images and reliance on text-based prompts.
While it's “still early days,” the group wrote, initial signs are extremely promising; with time and further improvements in technology, this can go a long way in revolutionizing the field of radiology, especially in noninterpretative tasks.
“This requires more research as well as weeding out the limitations in the technology, most important of which are reported inconsistencies including generating factually incorrect information (hallucinations),” the group concluded.
The full article is available here.