CHICAGO - An artificial intelligence (AI) algorithm can accurately predict whether a thyroid nodule is malignant, potentially helping to reduce the number of unnecessary biopsies performed on these patients, according to research presented on Sunday at the RSNA meeting.
A multi-institutional team trained a deep-learning model that could detect 95% of malignant nodules and potentially avoid more than one-third of fine-needle aspiration (FNA) biopsies of thyroid nodules.
"Deep learning/AI can accurately predict thyroid malignancy and has the potential to reduce the number of FNA biopsies," said presenter Ian Pan, a medical student at Warren Alpert Medical School in Providence, RI.
Unnecessary biopsies
Thyroid nodules are very common, but most are benign. In addition, thyroid cancers are usually indolent and slow-growing, Pan said.
The American College of Radiology's Thyroid Imaging Reporting and Data System (TI-RADS) was developed to decrease the number of unnecessary biopsies of thyroid nodules. These TI-RADS scores -- based on nodule features on ultrasound images -- are used along with the nodule size to determine whether a nodule should receive FNA biopsy. However, even with TI-RADS, a large number of nodules are still found to be benign after being biopsied, he said.
"We wanted to leverage AI to further reduce the number of unnecessary FNAs," Pan said.
The researchers gathered 151 malignant and 500 benign thyroid nodules from 571 patients at Washington University Medical Center in St. Louis. All ultrasound images were obtained on a Logiq E9 ultrasound scanner (GE Healthcare) using a linear transducer. For each nodule, one transverse view and one sagittal view were used, for a total of 1,302 images. Thyroid nodules were manually cropped out of the image, and the images were resized to 224 x 224 pixels.
"We thought that cropping the nodules would allow us to increase the relative resolution of the nodule for the network," Pan said.
The researchers trained and tested two different types of convolutional neural network (CNN) architectures: one based on MobileNet -- designed for embedded mobile applications -- and one based on a ResNet 50 CNN. The MobileNet CNN is based on 3.2 million parameters, while the ResNet 50 CNN is based on 23.6 million parameters.
"We wanted to see if a lightweight neural network designed for use on your cellphone could be as effective as a larger model," he said. "If successful, this would support the development of embedded deep-learning applications in mobile imaging devices."
The researchers performed tenfold double cross-validation to evaluate the models. Of the dataset, 80% was used for training the models and 10% was used for validation. The remaining 10% was set aside for testing. The cases were also stratified by malignancy status, and there was no overlap in patients among the different sets, Pan said.
Model performance was evaluated by determining the area under the curve (AUC) from receiver operating characteristic (ROC) analysis.
Ensemble models were created by using the three best individual models from the validation process. Each image/view was its own data point during training. During the inference process, each view of the nodule is assigned a score. As a result, a total of six scores -- three models multiplied by two views -- are produced. The final malignancy score is calculated by averaging the six scores.
The regions of interest surrounding the nodule are selected by the user for each view; each view is processed by all three models in the ensemble. The researchers found that a two-view ensemble of both the sagittal and transverse views yielded the best performance. In addition, MobileNet outperformed the larger ResNet 50 model.
Area under the curve for predicting thyroid malignancy | ||
ResNet 50 | MobileNet | |
2-view (sagittal and transverse) ensemble | 0.838 | 0.863 |
At a threshold score of 0.1, the MobileNet CNN could reduce the number of negative FNA biopsies by 36% while maintaining 95% sensitivity, according to Pan.
"We elected not to introduce a single binary decision threshold because we believe that our model would be better incorporated as a risk stratification tool, and that would also serve as additional information that clinicians could use to guide their decision to move forward with an FNA," he said.
In future work, the researchers plan to evaluate the model on independently collected data from another institution. They are also working to automate thyroid nodule cropping and to add the ability to automatically assign a TI-RADS rating. In addition, the researchers would like to compare the performance of the algorithm with the radiologist's TI-RADS rating and to examine the performance of TI-RADS in combination with deep learning/AI score, Pan said.
"We're also in the process of open sourcing our models," he said.
The research received the RSNA's Trainee Research Prize for a medical student.