PITTSBURGH - A deep-learning technique shows promise for helping to classify breast density on mammography images, according to research presented on Friday at the Society for Imaging Informatics in Medicine (SIIM) annual meeting.
A team from the University of Pittsburgh developed a deep learning-based method that could accurately classify mammograms into two of the most challenging BI-RADS density categories: B (scattered density) and C (heterogeneously dense). The results represent encouraging performance for distinguishing between these often difficult categories for radiologists, according to presenter Aly Mohamed, PhD.
"We anticipate that our approach will provide a promising toolkit that will help enhance current clinical assessment of breast density," Mohamed said.
An important risk marker
An established risk marker for breast cancer, breast density is routinely assessed by radiologists on digital mammograms using BI-RADS, a qualitative system. Breasts classified as BI-RADS density categories C (heterogeneously dense) and D (extremely dense) are considered to be dense, while those in categories A (fatty breasts) and B (scattered density) are considered to be nondense.
This method is subjective, however, and has been found to suffer from reader variability. Radiologists can have particular difficulty in distinguishing between BI-RADS categories B and C, a critical distinction given that women will be triaged into either dense or nondense breast groups based on the categorization.
Recommendations for supplemental screening and risk management vary by breast density, so it's highly desirable in the clinic to have consistent assessment, Mohamed said. Current methods for quantitative assessment of breast density typically calculate area- or volume-based quantitative measures.
"These software tools lack clinical validation or are limited to a specific setting, such as working only on 'raw' images," he said.
Applying deep learning
As a result, the researchers sought to develop a deep learning-based classifier that could consistently distinguish between BI-RADS density categories B and C. They performed a retrospective study involving 1,427 women who had undergone standard digital mammography screening from 2005 to 2016 and had known ground truth assessment as either BI-RADS density B or C. They also included a large dataset of 22,000 digital mammograms that were cancer-free.
The mediolateral oblique (MLO) and craniocaudal (CC) views of both breasts were used, and the group also utilized balanced numbers ranging from 500 to 7,000 images for each breast density category for training. A separate dataset of 1,850 images -- 925 each for BI-RADS B and C -- was used to test the deep-learning model, which was based on a convolutional neural network (CNN).
The researchers constructed a two-class CNN model aimed at classifying the two BI-RADS breast density categories. The CNN used an AlexNet model that had been pretrained on the ImageNet nonmedical imaging dataset and then received additional fine-tuning with the institution's own mammography images.
"Fine-tuning is necessary because otherwise the standard pretrained model was not able to make any meaningful classification," he said.
The model was then implemented using the Caffe platform, Mohamed said.
Next, the researchers performed receiver operating characteristic (ROC) analysis, using the area under the curve (AUC) to measure the performance of the algorithm. They also evaluated the effects of transfer learning -- the model's ability to store knowledge gained while solving one problem and then apply it to solve a new problem -- and performed several robustness analyses.
Due to reader variability, they tested the model's robustness by removing images that were potentially categorized inaccurately by the radiologist. Those images were identified after calculating a quantitative breast density percentage and comparing that result with the BI-RADS-based density categories. The average breast density percentage was 15% for the exams judged to be BI-RADS density category B and 29.5% for BI-RADS C.
High accuracy levels
The researchers found that it was worth the effort to cull the images that were potentially labeled inaccurately, yielding improved accuracy whether or not transfer learning had been incorporated.
Effect of transfer learning algorithm on breast density categorization | |||||
Before removal of images that were potentially categorized inaccurately | After removal of images that were potentially categorized inaccurately | ||||
Algorithm without transfer learning | AUC = 0.9421 | AUC = 0.9692 | |||
Algorithm with transfer learning | AUC = 0.9243 | AUC = 0.9726 |
"The high AUCs in both cases showed the deep learning-based classifier of breast density is robust to a real-world clinical dataset," Mohamed said. "This work adds a new example of applying deep learning and transfer learning in analyzing a large clinical breast imaging dataset."
Mohamed acknowledged a number of limitations in their study, including its single-center, retrospective nature. In addition, the studied images were read by many radiologists, and the researchers did not track which radiologist interpreted which images, he said.
In future work, the researchers plan to compare the deep learning-based method with traditional feature engineering-created descriptors, according to Mohamed.