How imaging AI developers can avoid pitfalls when testing algorithms

Jul 25, 2024

AI developers can recognize and avoid pitfalls when creating models and tools for interpretative imaging performance, according to an analysis published July 24 in the American Journal of Roentgenology.

Researchers led by Seyed Tabatabaei, MD, from Massachusetts General Hospital and Harvard Medical School in Boston, in their clinical perspective outlined these pitfalls and made suggestions on AI model training and validation, as well as using diverse datasets.

“Interpretive AI tools are poised to change the future of radiology,” Tabatabaei and colleagues wrote. “However, certain issues must be addressed before these algorithms are fully integrated into clinical practice.”

Radiology departments continue to adopt AI imaging tools into their clinical workflows. While studies have demonstrated the promise of these tools in action, the researchers caution that several pitfalls exist that could cause them to deliver false-positive or false-negative results.

$CT images from a 24-year-old woman with history of ventriculopleural shunt placement who presented with chest pain show two small linear hyperattenuating structures (A, arrows), with larger hyperattenuating structure oriented parallel and very close to the rib. The AI algorithm interpreted the finding as a rib fracture. (B) An additional axial image from same exam indicates that the finding relates to the patient’s ventriculopleural shunt (arrow), passing alongside the rib. The radiologist made a correct interpretation upon assessment of the entire exam. Image courtesy of the ARRS.$ CT images from a 24-year-old woman with history of ventriculopleural shunt placement who presented with chest pain show two small linear hyperattenuating structures (A, arrows), with larger hyperattenuating structure oriented parallel and very close to the rib. The AI algorithm interpreted the finding as a rib fracture. (B) An additional axial image from same exam indicates that the finding relates to the patient’s ventriculopleural shunt (arrow), passing alongside the rib. The radiologist made a correct interpretation upon assessment of the entire exam. Image courtesy of the ARRS.

Tabatabaei and co-authors outlined these pitfalls and how they lead to AI errors. They also offered potential strategies for AI developers to consider when training and validating models.

“Successful clinical deployment of AI tools requires that radiologist end-users understand such pitfalls along with other limitations of the available models,” they wrote.

Anatomic variants and age-related changes – The authors wrote that AI must recognize patterns of imaging manifestations stemming from anatomic variants and age-related changes. Once errors are identified, AI algorithms may be refined to learn to correctly recognize imaging findings.
Postoperative changes and medical devices – Previous reports suggest that AI algorithms “are not properly trained” to recognize postoperative changes and how they impact image interpretation, the authors noted. Also, medical devices such as catheters, implants, prosthetics, or pacemakers can impact AI’s performance. The authors suggested that incorporating scout images can improve model accuracy since they provide a comprehensive view of the body.
Image artifacts – Beam hardening and motion artifacts can negatively impact the performance of AI models. The team highlighted that historically, motion artifacts have been a common exclusion criterion for image selection within AI training sets.
Integrating prior and concurrent imaging exams - AI models may learn only to interpret a single imaging exam, without considering other relevant imaging exams, which may lead to diagnostic errors. The authors suggested that AI models should be trained to integrate and register prior and concurrent imaging exams into their interpretation, including exams from different modalities. “Potentially, AI could learn to similarly compare findings with earlier examinations,” they wrote.
Integrating patient medical history and other clinical info – By integrating medical history and other clinical information, as well as other available imaging exams, the team wrote that AI tools could learn to provide a holistic interpretation of a patient’s pathology.
Satisfaction-of-search effect – This is a phenomenon where radiologists may fail to detect an additional abnormality after an initial abnormality has already been identified on imaging. AI is not immune to this, and the authors wrote that AI algorithms must be developed to continue scanning each image even after detecting an initial pathology.

The perspective authors concluded that with AI’s diagnostic capabilities evolving, continuous learning through an iterative feedback loop is needed. This includes using explainable AI techniques such as heat maps to build trust among radiologists seeking to adopt algorithms into their clinical workflows and using AI for additional tasks related to imaging interpretation.

“Throughout this process, developers and radiologists must maintain close communication, to foster an adaptive yet precise AI framework that will ultimately advance radiology practice,” they wrote.

Jan Vosshenrich, MD, from University Hospital Basel in Switzerland, echoed that sentiment in an accompanying editorial. He wrote that developing and implementing explainable and adaptive interpretive AI frameworks needs close collaboration between developers and radiologists, as well as an in-depth understanding of a radiologist’s image interpretation process.

"Incorporating these and other perspectives outlined ... the precision of AI frameworks will increase and advance radiology practice," Vosshenrich added.

The full analysis can be found here.