A deep-learning algorithm can provide fully automated analysis of breast density on screening mammography exams, yielding density estimates that correlate well with assessments provided by radiologists, according to research published online January 24 in Medical Physics.
Researchers from the University of Pittsburgh Medical Center (UPMC) trained a deep-learning algorithm to segment dense fibroglandular tissue on mammograms. In testing, the algorithm's calculation of breast percent density (PD) correlated well with radiologists' classifications of breast density using BI-RADS, and it also outperformed an existing breast density estimation algorithm, according to the authors.
An active research area
Women with dense breasts face a higher risk of breast cancer than women with fatty breasts, and assessing breast density from screening mammograms is an active area of research, said lead author Juhun Lee, PhD. Radiologists typically use BI-RADS density classification -- the most widely accepted classification method -- to assign women to one of four breast density categories: 1 (entirely fatty), 2 (scattered), 3 (heterogeneously dense), and 4 (extremely dense).
"However, BI-RADS breast density classification is subjective and coarse, and therefore, different radiologists can assign a different BI-RADS density level to the same breast, especially for moderately dense breasts, which can be categorized as either scattered or heterogeneously dense," the authors wrote.
As a result, many previous studies in the literature have explored PD -- the relative proportion of dense fibroglandular tissue in a breast -- as an alternative to the four BI-RADS density categories. Researchers have developed various automated algorithms for facilitating the process of estimating PD from mammograms, but these models have either required a considerable amount of human input or relied on a list of rules to follow, according to Lee.
"The algorithm fails if it encounters an event that is out of the predefined rules," he told AuntMinnie.com. "This is a universal issue for automated algorithms built from human-defined rules."
Deep learning
To overcome such limitations, researchers in the computer-vision and machine-learning communities introduced deep learning, in which an algorithm learns and determines the rules to solve given problems, Lee said. The UPMC team sought to use deep learning to develop a fully automated algorithm for estimating mammographic density.
"Specifically, we used a deep-learning framework developed for image segmentation, i.e., fully convolutional network (FCN), which showed successful performance in segmenting objects in natural scenes, to segment both the breast and the dense fibroglandular tissue areas from mammograms," the authors wrote.
They trained the algorithm -- a modified version of the VGG-16 convolutional neural network -- using screening mammograms from 604 women who had received breast cancer screening at the UPMC network between 2007 and 2013. All mammograms were acquired on a Selenia full-field digital mammography system (Hologic), and they included the left and right mediolateral-oblique (MLO) views as well as the left and right cranial-caudal (CC) views. The researchers also utilized the "for presentation" version of each mammogram.
For the purposes of the study, the researchers established a ground truth based on manual segmentation of the breast areas on the mammograms, as well as segmentation of the dense fibroglandular tissues using simple thresholding based on the radiologist's BI-RADS assessment. Of the 604 mammograms, 455 were used for training and 58 were reserved for testing the deep-learning algorithm. The remaining 91 mammograms were set aside for validation and included a similar number of cases for each density level.
After the algorithm was trained, the researchers evaluated the model's performance by comparing its PD estimates with the radiologists' BI-RADS density assessments. Using the same validation dataset, the group also assessed a similar algorithm -- Laboratory for Individualized Breast Radiodensity Assessment (LIBRA) -- developed by the University of Pennsylvania.
The algorithm proposed by the UPMC researchers produced PD estimates that correlated well -- as determined via Pearson's rho correlation values -- with the ground-truth BI-RADS density ratings by radiologists, according to the authors. With the ground truth representing 1.00, the UPMC algorithm outperformed the LIBRA algorithm for all three mammography views.
Performance of algorithms for assessing breast density vs. ground truth | ||
View | LIBRA algorithm correlation values (rho) | UPMC algorithm correlation values (rho) |
Cranial-caudal | 0.58 | 0.81 |
Mediolateral-oblique | 0.71 | 0.79 |
CC-MLO-averaged | 0.69 | 0.85 |
The difference in performance between the proposed algorithm and LIBRA was statistically significant for the CC view (p < 0.0001) and the CC-MLO-averaged view (p < 0.006).
Research benefit
As BI-RADS assessment is the current clinically accepted measure for breast density, the algorithm -- and its ability to provide reproducible breast density measurements -- would benefit research projects in the short-term more than clinical practice, Lee said.
"When researchers explore breast density to predict cancer risk, breast [PD estimates] by the algorithm would provide more information than four-category BI-RADS density by radiologists, which are not reproducible and variable [between radiologists]," he said.
As the algorithm also segments the dense glandular tissue area of the breast, other researchers could use it to solve their own research problems related to dense glandular tissue.
"For example, researchers who develop computer-aided detection [software for] breast cancer may use our algorithm to first find dense glandular tissue area and then detect breast cancer," he said.
Future plans
The researchers are in the process of releasing the algorithm to the public via a program/resource sharing service such as GitHub, Lee said. In the future, they also want to test their algorithm for estimating breast density on digital breast tomosynthesis exams, as well as on mammography exams acquired on equipment from other vendors besides Hologic.
"As each vendor uses different image processing algorithms for presentation of mammography exams, we will test the proposed algorithm on mammograms from other vendors to check the robustness of the proposed algorithm," Lee said. "It may be necessary to train the algorithm with cases from other vendors."