False-positive CAD marks don't hinder VC readers' accuracy

Apr 20, 2008

Technology may get top billing in developing computer-aided detection (CAD) schemes, but psychology also plays a leading role. Few researchers have delved into the science of readers' reactions to CAD detections, but the human filter that either accepts or dismisses CAD marks is a crucial consideration in designing and implementing CAD software.

In this vein, researchers from St. Mark's Hospital in London wondered if a larger number of CAD false-positive marks would mislead readers of virtual colonoscopy (VC or CT colonography [CTC]) datasets.

The question is best understood in the context of the software itself. When CAD systems are configured to maximize lesion detection (i.e., higher sensitivity), they generally produce more false positives (i.e., lower specificity) as a by-product of that increased detection power.

"The reported number of CAD false positives does vary between systems ... but there's always a trade-off between sensitivity and specificity," said Dr. Stuart Taylor from University College Hospital in London. "We still don't know the absolute number of false positives which adversely influences reader performance, particularly in a population with a low-prevalence screening population where CAD may have the largest role."

Taylor, along with Rebecca Greenhalgh, Dr. Rajapandian Ilangovan, Dr. Steve Halligan, and colleagues sought to assess the effects of a larger number of CAD-generated false-positive detections on VC reader specificity and reporting times in a patient population with a low prevalence of cancer. They collected their data from an ongoing screening study, and presented their results at the 2008 European Congress of Radiology (ECR) in Vienna.

The researchers applied a CAD system (ColonCAD API 2.0, Medicsight, London) to 48 normal datasets that were divided into two groups of 24 patients each, with one group consisting of cases with fewer than 15 false positives each and another representing cases with more than 15 false positives. They then assessed whether radiologists were more accurate reading one group or the other, with the study powered to detect a 20% increase in reader accuracy.

All patients had undergone conventional cathartic bowel cleaning prior to virtual colonoscopy, consisting of 45 mL of sodium phosphate solution, plus a 2% barium suspension (250 mL) and diatrizoate (60 mL), Taylor said.

Sensitivity, specificity, and receiver operating characteristic (ROC) curves were calculated with and without CAD assistance. The relationship between the number of CAD false positives and reader confidence, reporting times, and correct dataset classification was analyzed using linear and logistic regression.

The four readers were all trained in CTC interpretation and familiar with the CAD software, as well as the review workstation (Vitrea 2 version 3.9, Vital Images, Minnetonka, MN), Taylor said. They were blinded to the purpose of the study and the prevalence of abnormalities, but they were told that they were examining a screening population, he said.

"They read without CAD, they reported what they found, said how confident they were, whether the (case) was normal or abnormal, and timed themselves," Taylor explained. "After the unassisted read, they turned CAD on as a second reader, wrote the additional findings, posted confidence scores, and whether (the case) was normal or not."

CAD detected all three polyps 6 mm and larger. After the CAD was turned on, overall sensitivity rose while specificity dipped (see chart below)

	Overall sensitivity	Reader 1 sensitivity	Reader 2 sensitivity	Overall specificity	Reader 1 specificity	Reader 2 specificity
Pre-CAD	75%	43%	95%	96%	91%	98%
Post-CAD	83%	52%	98%	93%	88%	96%

Pre-CAD there were eight false-positive detections (4 x 48 = 192 reads), he said. The addition of CAD added four more false-positive detections (2.1%) for a total of 12. Unfortunately, the four additional false positives were the same types the readers had declared, including redundant mucosa, two bulbous folds, and one case of fecal residue, he said.

CAD detected the same types of false positives as the unassisted readers. All images and data courtesy of Dr. Stuart Taylor.

"Why did they not get 100% sensitivity when CAD detected all polyps? Well, CAD correctly put a mark on (one) polyp (example), but for whatever reason some of the readers didn't see it despite a correct CAD prompt," Taylor said.

CAD had its benefits, too. There was a small but statistically significant increase in reader confidence post-CAD versus the pre-CAD analysis (2.1 [1.3, 2.8], p < 0.001), and a corresponding small increase in the area under the curve (AUC) post-CAD versus pre-CAD (0.57 [0.34, 0.80] versus 0.61 [0.42, 0.80]).

CAD can help: Before CAD, two readers incorrectly reported a bulbous fold as cancer, while only one reader called it a positive finding post-CAD.

CAD can hurt: Before CAD, a reader correctly dismissed redundant mucosa in the rectum (hemorrhoid). However, after CAD incorrectly marked the finding as positive, the reader incorrectly reported it as a small rectal polyp.

In addition, "with more CAD false positives, the readers didn't get less confident as to whether the case was normal or abnormal" (p = 0.71), Taylor said. "Also, the increased number of CAD false positives didn't correlate with whether the reader called the case normal or abnormal." There was no change in the odds of correct reader case classification per additional CAD mark (p = 0.23).

On the other hand, there was a small but significant positive correlation between increasing CAD false positives and reading time 0.06 (0.02, 0.10; p = 0.002). The pre-CAD mean reading time was 8.6 minutes. With the addition of CAD, cases with fewer than 15 false positives added a mean 3.3 minutes to reading time, while cases with more than 15 false positives added a mean 3.9 minutes, or four seconds per CAD mark, Taylor said.

"There was no evidence that increases in CAD false positives affected either reader confidence or correct reader classification, though it did increase reporting time slightly," Taylor concluded.

Future studies will rank the "plausibility" of CAD false positives, and determine whether CAD is able to maintain its net positive effect in a population with a low prevalence of disease.

By Eric Barnes
AuntMinnie.com staff writer
April 21, 2008

Radiologists more likely to reject certain true-positive CAD findings, January 22, 2008

Mammo CAD results show reproducibility in serial exams, January 10, 2008

VC CAD beats human readers in multicenter trial data, November 19, 2007

Polyp ranking scheme boosts CAD efficiency, October 1, 2007