Can machine-learning models predict IR outcomes?

May 21, 2020

2019 02 15 18 26 2638 Interventional Radiology 400

Machine-learning algorithms can utilize nonimaging data such as patient demographics and prior medical history to predict outcomes from interventional radiology (IR) procedures, according to research published online May 4 in the Journal of Vascular and Interventional Radiology.

A team of researchers from Brown University in Providence, RI, trained random-forest machine-learning models to predict three specific outcomes from interventional radiology procedures:

Iatrogenic pneumothorax from a CT-guided transthoracic biopsy (TTB)
In-hospital mortality after a transjugular intrahepatic portosystemic shunt (TIPS)
Patient length of stay longer than three days following a uterine artery embolization (UAE) procedure

In testing, model performance ranged from acceptable to excellence, according to the group.

"The results demonstrate that these models may be effective at predicting outcomes after procedures when trained on a large national dataset," wrote the authors led by Ishan Sinha. "This is particularly exciting considering that all of the model inputs are features that were available before admission."

The models were trained using retrospective patient data and outcomes from the Agency for Healthcare Research and Quality's National Inpatient Sample. The researchers selected the features corresponding to the clinical diagnoses separately for each of the models.

Of the data, 50% were used for training, 25% were utilized for tuning, and 25% were set aside for testing. In testing, the TTB model had the highest area under the curve (AUC), while the UAE model had the highest maximum F1 score -- a weighted average of the test's precision and recall.

Performance of machine-learning models for predicting IR procedure outcomes
	Iatrogenic pneumothorax from CT-guided TTB	In-hospital mortality after TIPS	Length of stay > 3 days after a UAE procedure
AUC	0.913	0.788	0.879
Maximum F1 score	0.532	0.357	0.7

Based on established guidelines for assessing AUC results, the TTB model was deemed to be outstanding, while the UAE model was considered to be excellent. The TIPS model was judged to be acceptable, according to the researchers.

The researchers noted that accuracy of these machine-learning models is influenced by factors such as the size of the dataset, the imbalance in outcome incidence, available features, and the proportion of missing data.

"Ultimately, results from this investigation encourage the application of machine-learning methods to IR decision support tools through the use of high-quality data," the authors wrote.

Although these results are encouraging, it's not surprising that a new tool isn't able to trivialize difficult problems, said Dr. Andrew Taylor, PhD, of the University of California, San Francisco in an accompanying editorial published online May 19.

"However, given their success to date with imperfect input material, imagine what can be accomplished in medical AI if there is commitment to curating large sets of data, developing infrastructure that supports data sharing and cross-platform deployment of AI models, and working toward developing algorithms that can provide more explanation of the insights they are pulling from complex systems," Taylor wrote. "Investment in these projects now can help guide the next 10 years of development and determine whether the decade of the algorithm is remembered as the beginning of a new era in data-driven medicine or a passing technologic curiosity."