An artificial intelligence (AI) algorithm can assist radiologists in interpreting routine chest x-rays and looking for changes on serial studies. But the software's limitations render it unsuitable for taking over the radiologist's role in reading these exams, according to research published October 4 in PLOS One.
Researchers led by Dr. Ramandeep Singh of Massachusetts General Hospital (MGH) tested a commercial deep-learning algorithm for its ability to detect four findings on chest x-rays. They found that the algorithm performed comparably to consensus reading by two experienced thoracic subspecialty radiologists, and it was at least as accurate as independent interpretations by four other thoracic subspecialty radiologists.
Yet, "though helpful in improving the accuracy of interpretation, the assessed [deep-learning] algorithm is unlikely to replace radiologists due to limitations associated with the categorization of findings (such as pulmonary opacities) and lack of interpretation for specific findings (such as lines and tubes, pneumothorax, fibrosis, pulmonary nodules, and masses)," the authors wrote.
Using the U.S. National Institutes of Health ChestX-ray8 database, Singh and colleagues from MGH and AI software developer Qure.ai gathered 874 deidentified frontal chest x-rays from 724 adult patients and applied the company's software to detect four specific findings: pulmonary opacities, pleural effusions, hilar prominence, and enlarged cardiac silhouette. For the purposes of the study, a reference standard was established in consensus for all 874 studies by two fellowship-trained thoracic subspecialty radiologists with 16 and 12 years of subspecialty experience, respectively.
Four other thoracic subspecialty radiologists -- with five, 25, 30, and 35 years of experience, respectively -- served as the "test" radiologists in the study and independently evaluated 724 exams for the four abnormalities. They also assessed 150 serial chest x-rays performed for follow-up of the initial findings.
After performing receiver operating characteristic analysis, the researchers found no statistically significant difference in accuracy between the deep-learning algorithm and the reference standard for all findings. In addition, the algorithm performed at least as well as the four test radiologists, based on the area under the curve.
AI algorithm vs. radiologists for detecting chest x-ray findings | |||||
Finding | Area under the curve by reader | ||||
Radiologist 1 | Radiologist 2 | Radiologist 3 | Radiologist 4 | Deep-learning algorithm | |
Enlarged cardiac silhouette | 0.862 | 0.801 | 0.868 | 0.788 | 0.936 |
Pleural effusion | 0.831 | 0.887 | 0.808 | 0.853 | 0.863 |
Pulmonary opacity | 0.792 | 0.789 | 0.773 | 0.758 | 0.843 |
Hilar prominence | 0.697 | 0.710 | 0.749 | 0.736 | 0.852 |
There was room for improvement for the deep-learning algorithm, however, in assessing change on serial exams, particularly for pulmonary opacities.
"This may have been due to variations in radiographic technique, or patient-related factors (such as differences in inspiratory effort and patient rotation over serial radiographs) on the appearance of pulmonary opacities," the authors wrote.
Although the algorithm is helpful for enhancing interpretation accuracy, it isn't likely to replace radiologists, they noted.
"However, [the deep-learning] algorithm can expedite image interpretation in emergent situations where a trained radiologist is either unavailable or overburdened in busy clinical practices," they concluded. "It may also serve as a second reader for radiologists to improve their accuracy."