Pilot study: Entropy-driven CAD zips through vast breast image database

Aug 1, 2006

An experimental computer-aided detection (CAD) system can speedily match a new mammogram with the most relevant cases in a database of breast images, according to a presentation Tuesday at the American Association of Physicists in Medicine (AAPM) meeting in Orlando.

Georgia Tourassi, Ph.D., and colleagues at Duke University in Durham, NC, have developed a knowledge-based CAD system (KB-CAD) that analyzes breast lesions using the principles of information theory. The selection of the most informative cases -- within seconds -- is done using image entropy, an indexing strategy that measures areas of light and dark pixels within an image.

"Our system calculates the entropy of the query region and proceeds with the decision. Entropy is a measure of uncertainty in the region; uncertainty based on the distribution of the pixel gray values in the region," Tourassi explained in an e-mail to AuntMinnie.com.

For this pilot study, the group used a database of 2,318 mammographic regions consisting of 905 masses and 1,413 normal cases. Tourassi broadly described the steps that the system goes through:

Entropy of query region is calculated.
Interim set of cases with entropy similar to the query are retrieved from the knowledge database.
System compares the query region with the retrieved cases using a more detailed analysis based on mutual information.
A decision is made.

Using this entropy-based indexing, the group then compared a sample image to the top 600 most informative cases, which reduced the system's processing time to three seconds (ROC area index, A_z = 0.91 ± 0.01), according to the results.

In an e-mail interview with AuntMinnie.com, Tourassi, who is an associate research professor in Duke's radiology department, shared more detailed information about the KB-CAD system, as well as her group's continuing investigation.

AuntMinnie: The KB-CAD system has been described as similar to the search engine Google in that it retrieves the most useful and relevant information first and offers those images for comparison. Is that an accurate assessment?

The Google analogy applies in our study in the sense that using brute force to compare a query case to every case in the knowledge database will be far too slow, as the database increases in size given that the similarity measure we use (i.e., mutual information) is computationally expensive. A more intelligent approach is necessary to sift through the database to find a few potentially relevant cases that should be examined more carefully. Our AAPM study examines how image entropy can be used to sift through the database fast.

Display of the relevant images for visual assessment is not the primary goal of our CAD system, although this is the next direction. The retrieved images are used to make a decision regarding the query case. The decision is based on a knowledge-based algorithm, which checks how well the query case matches the retrieved abnormal cases versus the retrieved normal cases. If the query matches the retrieved abnormal cases better, then we make an "abnormal" decision for the unknown query case.

We have not looked closely at the visual content of retrieved cases. Perceptual similarity and diagnostic similarity are two different concepts. Ideally, a successful image retrieval system should be able to achieve both simultaneously, but this is a topic of ongoing research in content-based image retrieval.

With regard to image entropy in a mammogram, do the light and dark pixels relate to lesions, masses, calcifications?

If there is little variability of gray values in the region, then its entropy is low.... The whole region is used to estimate the entropy; thus the pixels correspond to either an abnormality or normal parenchyma. Thus far our system deals only with the detection of masses and architectural distortion, not calcifications (although areas of masses with associated calcifications are included in our analysis).

How does the KB-CAD system differ from CAD systems that are currently on the market?

The commercial systems are based on proprietary algorithms, thus it's difficult to know exactly. However, based on available information, I would say that the main difference is that our system does not rely on image features to make decisions.

Image similarity is assessed using information theory and, in particular, mutual information (MI). MI is a measure of statistical dependence between two images. Thus we avoid issues of feature selection and extraction. This is mainly what separates our KB-CAD system from (other) knowledge-based systems. We do not rely on image features.

Does the KB-CAD affect the accuracy of image interpretation, such as reducing false positives or helping readers determine benign from malignant lesions?

One of the advantages of knowledge-based image analysis is that we can keep depositing new cases into the knowledge database without worrying about retraining the decision algorithm, as other CAD techniques, based on neural networks or linear discriminant analysis (LDA), need to do.

A comprehensive knowledge database is a critical component for KB analysis. However, if the system blindly compares the unknown query case to everything stored in its knowledge database, then the approach will become computationally inefficient as the database increases in size.

We propose the entropy-indexing technique to find an interim set of relevant cases before we proceed with detailed comparisons that need the calculation of MI. MI calculations are expensive. Entropy calculations are not. In addition, entropy (still) needs to be calculated as part of the MI calculation. That's why entropy is a natural indexing choice for our information-theoretic system.

Please note that I call the system both information theoretic (IT) and knowledge-based (KB). The IT label is due to the use of information-theoretic principles to do image retrieval and image similarity assessment. The KB label is due to the use of a knowledge-based algorithm to make decisions based on the retrieved cases.

Do you envision KB-CAD as a standalone system that can compete with other CAD software, or can it be used in addition to "traditional" CAD?

Both, and that's how we intend to test it clinically. Since it is designed to scrutinize mammographic locations, the system could be used in an interactive manner: The radiologist could launch it by clicking on an image location that looks suspicious and (acquire) a second opinion. In this capacity, our system could reduce the interpretation error associated with mammographic interpretation. Or the system could be used as an add-on to other feature-based CAD systems for false-positive reduction.

What will be your next phase of research?

We are still in the development stage. We have proof-of-concept and several successful feasibility studies. We are now focusing at optimizing the key parameters of the system to ensure the best possible accuracy and speed of analysis.

From a scientific point of view, we are also looking at how well the system translates among different digitizers and also from digitized (film-screen mammograms) to (full-field) digital mammograms. Preliminary results are extremely promising, suggesting a flexibility not seen with other feature-based CAD systems.

From a clinical point of view, the next phase is a preclinical study with radiologists using the system in the two capacities mentioned above. We want to test whether this new interactive CAD paradigm is clinically more effective.

By Shalmali Pal
AuntMinnie.com staff writer
August 2, 2006

Breast CAD takes aim at architectural distortion, June 30, 2006