AI models don't hold up when applied to external test datasets

Wednesday, December 1 | 3:00 p.m.-4:00 p.m. | SSPH12-2 | Room TBA
Many artificial intelligence (AI)-based algorithms have shown promise in detecting COVID-19 on chest x-ray; however, a team in this afternoon presentation asks, "Can a deep-learning model trained from one clinical site be applied to other sites?" The case report offers the very first set of scientific data regarding the generalizability of AI models in radiology, according to the group.

The researchers first developed a deep neural network model using 19,784 images (5,725 positive COVID-19 cases) from four sites within the Henry Ford Health System in Michigan. They then tested the model's generalizability to seven external, publicly available test sets.

Among the four internal test sites, the model achieved area under the curve (AUC) values ranging from 0.84 to 0.87 when identifying COVID-19 on x-ray. In contrast, when the model was applied to external test datasets, AUC performance ranged from 0.77 to 0.83.

"AI models trained from one, or a limited number of clinical sites, will drop in performance when they are applied to external test datasets," the researchers wrote.

Attend the session to learn more.

Page 1 of 376
Next Page