An artificial intelligence (AI) algorithm improved the performance of nonradiology physicians and even thoracic radiologists for detecting malignant pulmonary nodules on chest radiographs, according to research published online September 25 in Radiology.
After developing and testing a deep-learning algorithm, researchers led by Ju Gang Nam of Seoul National University Hospital and College of Medicine and Sunggyun Park of AI software developer Lunit found that AI outperformed more than half of the physicians participating in the study, including the mean results from a group of four thoracic radiologists. What's more, all physicians did better when they were able to use the software as a second reader.
The researchers trained and tested a convolutional neural network using 43,292 chest radiographs, as well as four other independent datasets. Next, they conducted an observer performance study on a dataset of 119 patients with pathologically confirmed malignant nodules. The study included 18 physicians: six radiology residents, five board-certified radiologists, four thoracic radiologists, and three nonradiology physicians.
They analyzed the algorithm's performance by calculating the area under the receiver operating characteristic curve (AUROC) and the jackknife alternate free-response receiver operating characteristic figure of merit (JAFROC FOM). On its own, the algorithm had an AUROC of 0.91 on a per-study basis -- significantly higher than 11 of the 18 physicians (p < 0.05). In terms of detecting malignant nodules, the algorithm showed a JAFROC FOM of 0.885, higher than all physicians and significantly higher than 15 physicians.
"More specifically, [the algorithm] showed high specificity and was able to detect 100% of high conspicuity nodules (score of ≥ 4), most large (> 3 cm) nodules, and more nodules in overlapped areas than the four groups of physicians in our study," the authors wrote.
When used as a second reader, the algorithm led to a statistically significant classification performance on a per-study basis (mean AUROC improvement of 0.04) for 15 physicians. As for nodule detection, the algorithm resulted in better performance -- a mean JAFROC FOM improvement of 0.043 -- in all physicians, including statistically significant gains in 14 physicians, according to the researchers. Notably, only two of the physicians -- both thoracic radiologists -- produced a higher JAFROC FOM than the algorithm achieved on its own.
Mean JAFROC FOM for detecting malignant pulmonary nodules | ||||
Nonradiology physicians | Radiology residents | Board-certified radiologists | Thoracic radiologists | |
Without AI | 0.691 | 0.796 | 0.821 | 0.833 |
With AI as a second reader | 0.828 | 0.829 | 0.840 | 0.854 |
The performance improvement from the use of AI was statistically significant (p < 0.05), according to the researchers.