CHICAGO - A deep-learning algorithm can automatically assess a lung nodule's risk of malignancy with a high level of accuracy, according to a presentation on Sunday at the RSNA 2017 meeting.
Researchers from the University of California, San Francisco (UCSF) Medical Center used data from the U.S. National Lung Screening Trial (NLST) to train a 3D convolutional neural network. In testing on a separate dataset, the algorithm yielded close to 90% accuracy in predicting the risk of malignancy of lung nodules, according to presenter Dr. Jae Ho Sohn, a resident physician at UCSF.
"There is inherent subjectivity in deciding a pulmonary nodule's chance of malignancy among radiologists" -- therefore, such an objective measure will be important, Sohn said.
A diagnostic challenge
Lung and bronchus cancer is the leading cause of cancer death in the U.S., resulting in 27% of male cancer deaths and 25% of female cancer deaths in 2017. It's also a field that's significantly underresearched and underfunded, according to statistics from the U.S. National Institutes of Health (NIH), Sohn said.
Via chest x-rays and chest CTs, radiology plays an essential role in the diagnosis and screening workup of lung cancer. When radiologists encounter lung nodules on chest CTs, they are tasked with estimating the risk of the nodule's malignancy. A number of papers have been published in the literature that offer criteria for gauging the malignancy risk of nodules.
"But unfortunately, even among thoracic radiologists, there is some subjectivity and variability in the way we risk-stratify these lung nodules," Sohn said.
Differences in risk stratification lead to differences in determining the next step for managing these patients. A further complicating factor is the difficulty of communicating these imaging findings just with words.
"So there's an important need for accurate and more objective risk stratification of these nodules," he said.
Deep learning
To help address this problem, the researchers sought to design a data-driven deep-learning algorithm to stratify risk and predict the precise likelihood of malignancy for nodules on chest CT. They used image data from a subset of 1,612 patients with chest CT studies from the NLST. Of these patients, 320 had cancer. The cancerous and benign nodules were manually annotated by a radiologist.
Sohn said they used a relatively standard deep-learning approach that consisted of a five-step process:
- Data annotation
- Image preprocessing
- Data augmentation to improve the model's accuracy
- Model training and validation
- Error analysis and results reporting
Most of the NLST dataset had pathology-proven positive cases for cancer, while the benign cases were deemed to be nonmalignant after being followed for a period of time. The x, y, and z coordinates as well as slice locations of the nodules were saved.
After preprocessing was performed, the lung was partitioned into numerous 3 x 3 x 3-cm volume blobs around the nodule coordinates as well as in random blank coordinates.
"All of these steps were done to really help with loading onto the neural network and then solving a problem," he said.
Perhaps the most unique aspect of the project was the use of an "aggressive" data augmentation approach given the low number of cancerous nodules available for training. The researchers' approach included morphing, flipping, and distorting the nodules to augment the data. In addition, nodules were "transplanted" into blank spaces of the lungs on the images and used as positive examples of lung cancer.
The researchers employed a 3D convolutional neural network to classify the lung nodules on each of the 3 x 3 x 3-cm blobs, using the Keras open-source neural network library with a TensorFlow back end, Sohn said.
Strong performance
In testing on a separate set of images, the algorithm classified cancer at an area under the receiver operating characteristic (ROC) curve of 0.89.
"As a radiology resident, I often perform right on par or slightly less than [the algorithm's performance]," Sohn said. "So I still have a lot to learn as a resident, but [these results do] show that [the algorithm] is performing relatively well."
The researchers then performed error analysis to further investigate the model's performance. This involved checking the algorithm for any systematic errors and determining if radiologists would agree with the algorithm.
"Overall, I was really happy with the error analysis -- that there is no systematic error that the training algorithm was creating," Sohn said.
The algorithm has a number of strengths. It's a fully data-driven approach, and there is also potential for even better results if more data could be used for training, he said.
It does have some limitations, however. Given the augmentation strategy used for training, the nodule location is invariant. The Fleischner Society guidelines note, for example, that upper-lobe nodules have a slightly higher chance of being cancerous, Sohn said.
The dataset was also from a screening population, so expanding it to include a greater variety of patients would be helpful, he said.