A commercial AI system achieved high marks as a first reader in double-reader settings for breast cancer screening, a Danish study published December 20 in Cancer Imaging found.
Researchers led by Mohammad Elhakim, MD, from Odense University Hospital found that a standalone AI system achieved high specificity as a first reader and for combined reading, along with moderate sensitivity for both reading scenarios. Integrated AI meanwhile overall improved sensitivity compared with combined reading, albeit with lower specificity.
“Replacing the first reader in double reading with AI could be feasible, but choosing an appropriate AI threshold is crucial to maintaining cancer detection accuracy and workload,” Elhakim and colleagues wrote.
While double reading in breast cancer screening is used in Europe, radiologist shortages and capacity issues could hinder the feasibility of this method. With this in mind, the Danish Health Authority in 2022 recommended replacing first readers in double reading settings with AI if the technology is proven to be efficient.
Elhakim and co-authors studied the effectiveness of a commercially available AI system (Transpara version 1.7.0, ScreenPoint Medical BV) in a Danish screening population. They assessed the system’s detection accuracy in two scenarios, standalone and integrated screening replacing the first reader. They also compared the results with those of first reading and combined reading.
The team included data from 257,671 screening mammograms performed between 2014 and 2018. It also applied two AI-score cutoff points by matching the average first reader sensitivity (AI sensitivity) and specificity (AI specificity).
The researchers reported that standalone AI (sensitivity) showed a lower specificity and positive predictive value (PPV), as well as a higher recall rate compared with the first reader measures (p < 0.0001 for all). Meanwhile, standalone AI (specificity) showed a lower sensitivity (p < 0.0001), PPV (p = 0.01), and negative predictive value (NPV, p = 0.0002).
Compared with combined reading, integrated AI (sensitivity) achieved higher sensitivity (p = 0.0004), lower specificity, and PPV, as well as higher recall and arbitration rates (p < 0.0001 for all).
Performance of AI in double reading setting | ||||||
---|---|---|---|---|---|---|
Standalone AI | Integrated AI | |||||
First reader | AI (sensitivity) | AI (specificity) | Combined reading | AI (sensitivity) | AI (specificity) | |
Sensitivity | 63.7% | 63.7% | 58.6% | 73.9% | 76.2% | 74.6% |
Specificity | 97.8% | 96.5% | 97.8% | 97.9% | 97.3% | 97.9% |
Positive predictive value | 18.7% | 12.6% | 17.4% | 22% | 18.1% | 22% |
Negative predictive value | 99.7% | 99.7% | 99.7% | 99.8% | 99.8% | 99.8% |
Recall rate (per 1,000 screens) | 2.7 | 4 | 2.7 | 2.7 | 3.3 | 2.7 |
Arbitration rate | N/A | N/A | N/A | 2.9 | 5.1 | 4 |
AI (sensitivity) refers to the AI score cut-off point matched at average first reader sensitivity. AI specificity refers to the AI score cut-off point matched at mean first reader specificity. |
Also, aside from a slightly higher arbitration rate, integrated AI showed no significant difference in any outcome measures. Subgroup analyses also showed higher detection of interval cancers by standalone AI and integrated AI at both thresholds (p < 0.0001 for all). This included varieties of composition among detected cancers across multiple subgroups of tumor characteristics.
The study authors also noted that this included “a general tendency of lower accuracy for screen-detected cancers and higher accuracy for interval cancers.”
“Discrepancies in cancers detected by the AI system and radiologists could be harnessed to improve detection accuracy of particular subtypes of interval cancers by applying AI for decision support in double reading,” they wrote.
The full study can be found here.