Researchers from the University of Maryland used computational “stress tests” to assess the robustness of the model that won the 2017 RSNA Pediatric Bone Age Challenge with a concordance of 0.991 to the radiologist ground-truth. They found that the algorithm generalized well to external data, but it also produced inconsistent predictions -- and more clinically significant errors -- on images that had undergone simple transformations reflective of clinical variations in radiograph processing.
These transformations included rotations, flips, brightness adjustments, contrast adjustments, inverted pixels, the addition of a standard radiological laterality marker, and resolution changes from the baseline of 1024 x 1024 pixels.
“Our results indicate that [deep-learning] models may not perform as expected in the real world and that they should be thoroughly stress tested prior to deployment in order to determine if they are ‘clinic ready,' ” wrote presenter Samantha Santomartino and colleagues.
Sit in on this Sunday morning presentation to learn more.