C-MIMI: Radiologists 'swarm' with AI to detect pneumonia

Sep 10, 2018

2017 06 05 10 24 07 864 San Francisco Golden Gate 400

Are radiologists and artificial intelligence (AI) better together than apart? The answer is yes, at least for diagnosing pneumonia on chest x-rays, according to research presented on Monday at the 2018 Conference on Machine Intelligence in Medical Imaging (C-MIMI) in San Francisco.

In a study involving eight radiologists at different locations working together in an "intelligent swarm" using AI-based crowdsourcing software, researchers from Stanford University and AI collaboration software developer Unanimous AI found that radiologists achieved higher diagnostic accuracy when working together rather than on their own. The collective radiologist performance was also superior to that of a separate machine-learning algorithm that had previously been shown to significantly outperform individual radiologists in diagnosing pneumonia, the researchers said in the talk at C-MIMI, which is sponsored by the Society of Imaging Informatics in Medicine (SIIM).

The eight participating radiologists initially assessed a set of 50 chest x-rays individually for the presence of pneumonia; next, they reviewed them together in real-time with the help of Swarm AI software (Unanimous AI), which enabled them to work as a "hive mind," according to Dr. Safwan Halabi of Stanford and colleagues. Ultimately, the radiologists converged on a probabilistic diagnosis that reflected the likelihood that the patient had pneumonia. Afterward, the same 50 chest x-rays were processed by CheXNet -- a convolutional neural network previously developed by Stanford -- to also provide a probability of pneumonia.

The researchers compared the two sets of probabilities against the ground truth using three statistical measures: binary classification of pneumonia (using 50% probability as the cutoff for classifying a correct diagnosis), receiver operating characteristic (ROC) analysis, and mean absolute error (the absolute value of the ground truth minus the predicted probability).

Performance for diagnosing pneumonia on chest x-rays
	CheXNet	Real-time radiologist collaboration enabled by AI
Binary classification of pneumonia	60% diagnostic accuracy	82% diagnostic accuracy
Area under the curve (AUC)	0.708	0.906

Mean absolute error analysis showed that the "swarm" of radiologists was 22% more accurate than CheXNet, according to the researchers. All differences were statistically significant (p < 0.01 for binary classification and ROC analysis; p < 0.001 for mean absolute error).

"Diagnosing pathologies like pneumonia from chest x-rays is extremely difficult, making it an ideal target for AI technologies," said co-author Dr. Matthew Lungren of Stanford in a statement. "The results of this study are very exciting as they point toward a future where doctors and AI algorithms can work together in real-time, rather than human practitioners being replaced by automated algorithms."

Notably, the crowdsourcing software could also lead to more accurate ground-truth datasets used to train AI systems such as CheXNet, according to first author Halabi. This could potentially facilitate future breakthroughs in AI.

"Ground-truth datasets are always a challenge for training AI systems in radiology as they depend on human judgment," Halabi said in the Stanford statement. "This new technology may enable us to generate more accurate datasets and increase the accuracy of all systems that use machine learning to train on medical datasets."