Breast ultrasound CAD performance varies in ethnic populations

Sep 4, 2008

A pair of new studies on a prototype breast ultrasound computer-aided detection (CAD) application has produced intriguing results. They indicate that the software's performance can vary based on factors such as the clinical environment in which it's used and possibly even the ethnicity of patients.

Both studies tested a breast ultrasound CAD algorithm under development at the University of Chicago. In the first study, published in the August issue of Radiology, the researchers tested the software's accuracy on a multiracial patient population from the U.S. In the second study, presented at the American Association of Physicists in Medicine (AAPM) annual meeting in Houston, they tested the algorithm on data collected from an ethnically homogenous group of patients in South Korea and a comparably sized subset of the multiracial patient database.

Although in theory the algorithm should have delivered somewhat similar results regardless of the patient population, in reality it produced different accuracy ratings based on the different datasets -- leaving the researchers scratching their heads for an explanation.

For the Radiology study (August 2008, Vol. 24:2, pp. 392-397), a University of Chicago team led by Karen Drukker, Ph.D., tested the algorithm using a database of 2,266 breast ultrasound images derived from examinations performed at hospitals in the greater metropolitan Chicago area. The Chicago database contained 1,046 distinct abnormalities on 2,266 images of 508 patients.

In the dataset, 183 patients, or 36% of the cohort, were referred to biopsy, while 157 cancerous lesions were identified in 101 patients. The most prevalent lesion type was cystic, with most being small subcentimeter cysts.

The algorithm's ability to distinguish malignant from benign lesions was measured using area under the receiver operating characteristic curve (AUC) analysis, with the software obtaining an AUC value of 0.88, and 100% sensitivity was achieved at 26% specificity. The specificity for the radiologists who made independent determinations was zero at 100% sensitivity, because these lesions all had been biopsied.

The software's AUC value improved to 0.90 when all lesions, both benign and malignant, were included, and 100% sensitivity and 30% specificity were achieved. The specificity for the radiologists at 100% sensitivity was 77%. The researchers pointed out that the radiologists had a great advantage when deciding whether to recommend a biopsy, because they had access to the patients' mammograms and clinical history.

Korean discrepancy

But in the AAPM paper, the breast ultrasound CAD software produced different results, according to University of Chicago researcher Nick Gruszauskas, Ph.D. Gruszauskas and colleagues applied the algorithm to a database of 462 ultrasound images (145 malignant and 317 benign) from examinations performed on Asian women in Seoul, South Korea. Unlike the multiracial U.S. dataset, this was the first time that the CAD software had been tested with a database representing a single ethnic patient population, according to Gruszauskas.

For point of comparison, the researchers tested the CAD algorithm on a subset of the University of Chicago database consisting of 433 lesions (127 malignant, 306 benign), using the same protocols and methodologies as described in the article published in Radiology.

For this analysis, the group used a CAD testing protocol called round-robin analysis, in which the CAD algorithm is trained on every case in a dataset but one, and then tested using the one case that's been left out.

When round-robin analysis was applied to the U.S. and Korean datasets, the results were as expected by the researchers, Gruszauskas reported to AAPM attendees. The CAD software produced an AUC curve of 0.89 when analyzing the Chicago database and an AUC curve of 0.90 when analyzing the Korean database.

But things changed when the researchers applied a new CAD analysis methodology, called independent testing, to the datasets. With independent testing, the CAD system is trained using one database and then tested using a completely separate database.

Independent testing is believed to be a more rigorous method of CAD analysis that more closely approximates real-world conditions, Gruszauskas said.

When the CAD system was trained with the Korean database and tested with the subset of the Chicago database, the AUC was 0.84. When the CAD system was trained on the subset of the Chicago database and tested with the Korean database, the AUC curve was 0.80. In a perfect world, the CAD algorithm would have produced results in which the AUC curves were closer, as with the round-robin testing.

Gruszauskas believes that the discordant results may be attributed to two factors. In several Asian countries, breast ultrasound is often used as a breast cancer screening modality in addition to being an adjunct to diagnostic mammography. This necessitates different imaging protocols, which can affect the number and types of images saved for review, as well as the manner in which the exams are performed. The Asian database may therefore represent different protocols than the Chicago database.

The other factor is that the higher breast density of Asian women may have had an effect on sonographic feature extraction. Previous studies have shown that Asian women tend to have increased breast density, and this higher average density may change the average value of a feature in the CAD software that is linked to or dependent upon density.

"One of my theories is the measurement for shadowing isn't working as well," Gruszauskas said. "In North America, shadowing is an important factor, but in Asia it may not be as important. If I train the software to not give as much importance to shadowing with the Asian database, the results may be more comparable."

The University of Chicago research team is conducting additional tests and is preparing a journal paper that explores these results in depth. The research team also is interested in furthering its studies of the CAD software with other sonographic breast ultrasound databases originating from both within and outside of North America and that also represent other ethnic populations, according to Maryellen Giger, Ph.D., professor of radiology at the University of Chicago.

By Cynthia Keen
AuntMinnie.com staff writer
September 5, 2008