Doctoral candidate Alejandro Rodriguez-Ruiz of Radboud University Medical Center in Nijmegen, the Netherlands, and colleagues found that the performance of a deep-learning algorithm was not significantly different than the average performance of six radiologists.
The study's radiologists reviewed 155 digital mammography exams, ranking any identified lesions for suspicion of malignancy on a scale of 0 to 10 (73 of the exams were malignant and 82 were benign). The researchers then applied a commercially available computer system based on deep-learning technology to the same dataset. The software identifies soft-tissue lesions and calcifications and then produces a score for cancer suspiciousness based on the same scale used by the radiologists. The group compared the radiologists' performance with that of the software using the area under the curve (AUC) measure.
On average, the radiologists' AUC was 0.83 for the whole dataset, while the deep-learning system's average AUC was 0.79.
The findings suggest that deep learning could offer additional support to busy radiology practices, the researchers concluded.
"Computer systems with similar clinical performance as radiologists could be used, for instance, as double reading, to automatically discriminate normal cases, or to shorten reading time," they wrote.