Artificial intelligence (AI) algorithms for medical imaging must be effectively evaluated before they are used in clinical practice, according to a commentary published November 8 in Radiology.
Assessing AI's diagnostic performance -- both as an add-on tool and as a standalone technology -- is crucial, wrote a team led by Dr. Seong Ho Park, PhD, of the University of Ulsan College of Medicine in Seoul, South Korea.
"[Clinical] evaluation of AI algorithms before adoption in practice is critical," it wrote.
AI technology can be assessed for any diagnostic benefits it offers compared with conventional care and for evaluation of the effect of AI intervention on patient care, Park and colleagues explained. It's also important to determine where best to place the AI intervention in the diagnostic process -- whether as an add-on to clinician diagnosis or whether used alone.
The group suggested that the following be considered to evaluate AI-assisted technology that will be used for medical diagnosis:
- Conduct testing with data outside of the AI model's training and validation sets. "AI algorithms tend to have excellent accuracy for training data and in the development environment, but their performance often deteriorates in the case of external data from real-world practice not used for training," the group wrote.
- Conduct both paired and parallel studies to compare conventional and AI-assisted diagnoses. "For comparing the performance between AI-unassisted and AI-assisted interpretations ... a paired design is effective," the team explained. "A parallel design can be used to compare the performance of AI-unassisted and AI-assisted diagnoses ... and [is] generally better suited for evaluating the effect of an AI intervention (i.e., comparison of the outcomes between conventional care and AI-assisted care)."
- Use a guideline to create AI studies and report their results. The health research support tool, EQUATOR Network, has a library of at least six guidelines suitable for AI evaluation. "These guidelines aid not only the reporting and appraisal of studies but also the design," Park and colleagues wrote.
The authors hope their suggestions will translate into further research.
"Clinical evaluation aims to confirm acceptable AI performance through adequate external testing and confirm the benefits of AI-assisted care relative to conventional care through adequately designed and conducted studies, for which prospective studies are desirable," they wrote.