AI can be used to 'rule-out' breast cancer on mammography

Apr 10, 2024

A deep-learning algorithm can rule out the presence of breast cancer on screening mammograms, improving specificity and yielding significant workflow and downstream savings, according to research published April 10 in Radiology.

A team of investigators led by first author Stefano Pedemonte, PhD, of AI software developer Whiterabbit.ai, and senior author Richard Wahl, MD, of the Mallinckrodt Institute of Radiology, trained and tested a deep-learning algorithm using over 160,000 2D full-field digital mammography exams. They found their model could sharply reduce the number of screening mammograms requiring radiologist review and lower the number of false-positive results with minimal, if any, cost to sensitivity.

“The elimination of incorrect follow-up examinations and biopsies, which constitute major limitations of breast cancer screening today, benefits patients directly and is the most critical advantage of cancer rule-out technology,” the authors wrote.

The researchers then retrospectively trained and tested the algorithm using datasets from two U.S. institutions and one U.K. institution.

U.S. dataset 1: 143,593 mammograms interpreted by 11 breast radiologists from 2008 to 2017
U.S. dataset 2: 1,362 mammograms interpreted by 59 radiologists from 2014 to 2019
U.K. dataset 3: 18,873 mammograms interpreted by 210 readers from 2011 to 2015

Datasets 1 and 3 were used mostly for training and validating the algorithm, with 10% set aside for testing. In addition, U.S. dataset 2 was used only for algorithm testing.

Malignancy probability

The AI algorithm calculates an examination malignancy probability after analyzing mammogram images, the patient’s age, prior mammogram images, and BI-RADS assessments for previous exams, if available. Exams with a malignancy probability below a certain sensitivity threshold were deemed to be BI-RADS 1 (benign) and did not require radiologist interpretation.

In addition to projecting how the algorithm would have affected workflow and accuracy on the test datasets, they also performed a clinical workflow simulation to estimate the downstream impact of the radiologist + AI paradigm.

Estimated performance and downstream impact of AI algorithm ruling out benign cases
	U.S. Dataset 1	U.K. Dataset 3	U.S. Dataset 2
Reduction in number of screening exams requiring radiologist interpretation	41.6%	36.8%	19.5%
Reduction in number of false-positive callb acks	31.1%	17.1%	11.9%
Reduction in number of benign biopsies	7.4%	5.9%	6.5%

All differences from the original radiology interpretations were statistically significant, with the exception of the reduction in the number of benign biopsies on U.S. dataset 2 (p = 0.08).

Missed cases

Two cancer cases -- both from U.K. Dataset 3 -- were missed that would have been otherwise detected if the algorithm hadn’t been used to rule them out. However, the sensitivity of AI rule-out workflow was not statistically inferior to a standard workflow and the cancer detection rate was also statistically noninferior.

“Overall, while our device’s performance was lower than the radiologists at a similar operating point, rule-out is designed to complement the radiologist and operate at a point of high sensitivity, leading to improvements for the potential human+AI paradigm,” the authors wrote.

They also noted that the reduction in false positives may also boost screening compliance, as false positives and patient anxiety have been linked to lower screening compliance rates. These results also prevent the financial burden of follow-up exams and treatment, according to the researchers.

Lightening the load

What’s more, the algorithm could help lighten the load of radiologists and technologists, improving patient access in a time of workforce shortages.

“Lower utilization of diagnostic and biopsy studies could provide patients with more prompt access to definitive diagnosis,” the authors wrote. “Lastly, reducing radiologist workload can also mitigate burnout, address workforce shortages, and help expand nascent, underresourced screening systems.”

The authors pointed out that the U.S. Preventive Services Task Force’s (USPSTF) biennial screening recommendations were estimated to reduce false positives by 68% at the cost of lowered sensitivity and 30% fewer deaths averted.

“Consequently, many individuals would die of breast cancers that would have otherwise been found by more frequent screening,” the authors wrote. “However, our study showed that algorithms may achieve at least half of the reduction of the false-positive rate achieved by the USPSTF guidelines update while reducing the sensitivity by only 1%.”

The researchers noted that their technology may affect individual radiologist performance in clinical use, reducing the false-positive rate without substantially decreasing sensitivity. Radiologists with high false-positive rates benefited the most in the study.

“Our analysis also introduces the concept of relative sensitivity, meaning that the device still detects the cancers that radiologists detect,” the authors wrote. “We observe that even when the absolute sensitivity is 97%, the relative sensitivity can remain at 100%, suggesting that no new cancers would be lost in this paradigm.”

Importantly, quality assurance and monitoring systems will need to be developed to guarantee safe operation of these types of rule-out algorithms, according to the authors. In addition, the benefits to the patients, radiologists, and the healthcare system will need to be substantiated in other studies, they said.

“With these in place, rule-out devices could offer a safer and more effective alternative to improving screening than restrictive nationwide guideline changes,” the authors wrote.

The full study can be found here.