Can AI learn how to understand radiologist reports?

Jan 31, 2018

2017 03 30 10 29 47 941 Computer Data 400

Can artificial intelligence (AI) technology learn how to understand radiologist reports? Yes, it can, and this capability could spark development of the next generation of deep-learning algorithms to aid radiologists, according to research published online January 30 in Radiology.

A team of researchers led by John Zech of the Icahn School of Medicine in New York City evaluated several different machine-learning models based on natural language processing (NLP) to identify findings on head CT reports. In testing on over 400 cases, the best-performing models had an area under the curve (AUC) of 0.966 for identifying a critical finding and an average AUC of 0.957 for all head CT findings.

The study lays the groundwork for training the next generation of AI tools to assist radiologists in image interpretation, according to senior author and neurosurgeon Dr. Eric Oermann.

"Data is like the new oil; it needs to be somehow refined and processed," Oermann said in a video from Mount Sinai Health System.

The researchers set out to generate machine-readable features -- i.e., text describing the presence or absence of hemorrhage, fracture, stroke, etc. -- from head CT radiology reports, Zech said.

After randomly selecting 1,004 head CT reports gathered at two institutions between 2010 and 2016, the researchers preprocessed the reports and extracted machine-analyzable features using various NLP techniques such as bag-of-words (BOW), word embedding, and Latent Dirichlet allocation-based approaches. These head CT reports were also manually labeled by physicians for findings, and a subset of these cases were deemed to have critical results. Of the 1,004 reports, 602 (60%) were used for training the machine-learning models, while 402 (40%) were set aside for validating their performance.

The best-performing model used BOW with unigrams, bigrams, and trigrams plus the average word embeddings vector, according to the researchers.

Performance of the best machine-learning model for identifying head CT findings
	For any critical head CT finding	Average for all head CT findings
Area under the curve	0.966	0.957
Sensitivity	175 of 189 (92.6%)	1,898 of 2,103 (90.3%)
Specificity	191 of 213 (89.7%)	18,351 of 20,007 (91.7%)

"Because these labels [derived from the machine-learning models] are based purely on the text contained in the reports and do not incorporate any machine-derived features learned from the corresponding imaging, they can be used as an independent set of labels to train deep learning-based image classification models," the authors wrote.