Machine learning might one day be able to tell which breast cancer patients will benefit from additional genetic testing. In a recent study, researchers used natural language processing to identify key features from ultrasound reports associated with cancer recurrence risk.
The study included hundreds of women with breast cancer who had previously undergone genetic tests, also known as transcriptomic tests, to determine their risk of cancer recurrence. After using a script to parse the women's BI-RADS ultrasound findings, the researchers found two key features that may identify when a patient could benefit from additional testing.
"Ultrasound findings, notably the 'retrotumoral' and 'margins' features, if abnormal, may help provide justification to obtain one of the transcriptomic tests," wrote the authors, led by Dr. Neema Jamshidi, PhD, a diagnostic radiologist from the University of California, Los Angeles (UCLA) David Geffen School of Medicine (Plos One, January 10, 2020). "Future multi-institutional prospective studies will be important in determining if these observations persist in larger cohorts."
Keywords for recurrence risk
New genetic tests, including Oncotype DX and MammaPrint, can analyze gene expression to estimate a patient's risk for cancer recurrence and, therefore, the need for chemotherapy. While these tests may save patients with low recurrence risk from going through chemotherapy, the tests can cost thousands of dollars and may be prohibitively expensive.
The researchers wanted to see if they could use machine learning to search BI-RADS terminology from ultrasound scans to identify keywords that indicate when a patient might benefit from additional testing.
To do so, they acquired data from the electronic health records of 219 patients with breast cancer at the Harbor-UCLA Medical Center between April 2008 and January 2013. All patients had an ultrasound scan performed when they were first diagnosed with breast cancer. They also all had either an Oncotype DX or MammaPrint test to identify their risk of cancer recurrence.
The researchers coded a custom script to analyze the BI-RADS findings from the descriptive terminology from the women's initial ultrasound scans. Their program searched the terminology and attempted to find words or phrases associated with cancer reoccurrence risk.
In particular, three sonographic features -- "margins," "retrotumoral," and "internal echoes" -- were correlated with the genetic test results. The features "margins" and "retrotumoral" appeared in both the MammaPrint and Oncotype DX classification trees, while "internal echoes" only appeared in the MammaPrint classification tree.
"Given the invasive nature of tissue-based tests and the cost associated with tissue biopsies, processing, and analysis, in addition to the costs of commercial tests, the use of ultrasound imaging information to help identify cases in which transcriptomic tests may alter patient management provides a potential means to make the transcriptomic tests more cost-effective," the authors wrote.
More patients needed
It is important to note that the study had limitations. The model had a 77% chance and 65% chance of matching ultrasound terminology to recurrence risk for the Oncotype DX and MammaPrint tests, respectively.
In addition, the study had a limited number of patients and the sample size of patients who received each test wasn't the same.
The researchers hope future studies use more patients to determine whether atypical findings related to "margins" and "retrotumoral" features can truly help determine which patients would benefit from genetic testing. They also hope studies will evaluate other types of genetic tests.
"The predictive capability using structured language from diagnostic ultrasound reports was moderate for the two tests, and provides added value from ultrasound imaging without incurring any additional costs," the authors concluded. "Incorporation of additional measures, such as ultrasound contrast enhancement, with validation in larger, prospective studies may further substantiate these results and potentially demonstrate even greater predictive utility."