Radiology-specific LLM generates professional report impressions

Sep 17, 2024

AI can generate radiologic report impressions that are professionally and linguistically appropriate for a full spectrum of radiology examinations, according to a study published September 17 in Radiology.

A team led by PhD candidate Lu Zhang of Shanghai Jiao Tong University in Shanghai, China, developed a large language model (LLM) that generates interpretations (“impressions”) on reports based on imaging findings and evaluated its performance in professional and linguistic dimensions.

“This study demonstrated that an LLM can learn to generate readable and clinically valuable radiologic impressions for various imaging modalities and anatomic sites,” the group wrote.

Radiology reports document key findings from image readings and include an impressions section meant to provide a concise summary and interpretation of the findings. But automatically generating impressions with the aim of helping reduce workloads for radiologists poses a technical challenge due to the complexity of radiologic findings, the authors noted.

Previous research has leveraged general LLMs to summarize imaging findings in chest x-ray reports, and in this study, the researchers aimed to extend report impression generation to a comprehensive range of imaging modalities and anatomic sites.

The researchers used an LLM (WiNGPT-7B) with 7 billion parameters and pretrained it with an additional 20 gigabytes (GB) of medical and general-purpose text data. They then fine-tuned the model with a data set comprised of 1.5 GB of data, including 800 radiology reports.

The researchers first tested the model on a data set that included report findings from 3,988 patients. Compared with the final impressions of expert radiologists (reference standard), the LLM-generated impressions achieved a median recall score of 0.775, a precision score of 0.84, and an F1 score (balance between recall and precision) of 0.772.

Next, to assess its language skills, the researchers tested the model with a subset of imaging findings from 1,014 patients. Of these, CT was used in 510 cases, MRI in 401, radiography in 67, and mammography in 36. The subset consisted of craniofacial, neck, chest, upper abdomen, lower abdomen, vascular, bone and joint, spine, and breast exams.

An expert panel scored the linguistic dimensions of the LLM-generated impressions versus the final impressions of radiologists on a five-point Likert scale. The expert panel scores indicated that the LLM-generated impressions were closely aligned with the radiologists’ final impressions, with a median score of 5 (“strongly agree”), according to the findings.

“The developed LLM generated radiologic impressions that were professionally and linguistically appropriate for a full spectrum of radiology examinations,” the researchers wrote.

The group noted limitations, namely that the model was developed and assessed at a single center and that the value of the LLM will need to be validated in a multicenter setting.

Nonetheless, the study illustrates that the emergent capabilities of LLMs may be a potential solution for learning the knowledge required in radiology to address complex clinical challenges, the authors suggested.

“The process of learning a large body of knowledge to gain the ability to interpret a full spectrum of radiology examinations, which is similar to how humans learn, affirms the feasibility of LLM emergent capability,” they concluded.

The full study is available here.