Machine learning boosts chest CT's performance

Nov 17, 2020

Machine learning-based CT texture analysis software improves reader accuracy for assessing patients with chronic obstructive pulmonary disease (COPD), interstitial lung diseases, or infectious diseases, according to a study published November 12 in the European Journal of Radiology.

The results suggest that machine learning could offer "second-reader" support to radiologists using CT to evaluate chest conditions, wrote a team led by Dr. Yoshiharu Ohno, PhD, of Fujita Health University School of Medicine Graduate School of Medicine in Toyoake, Aichi, Japan.

"Machine learning-based CT texture analysis software has potential for improving interobserver agreement and differentiation accuracy for assessment of radiological findings ... on thin-section CT for patients with COPD, interstitial lung diseases, or infectious diseases," the team wrote. "This software may be able to perform as a second reader for this ... purpose in routine clinical practice."

CT exams are key to tracking disease severity and evaluating the efficacy of treatment in patients with various lung conditions, but agreement among readers is often low, Ohno and colleagues noted. That's where machine learning could help.

"It has been suggested that one of the problematic factors for visual investigation [using CT] is relatively low interobserver agreements among physicians and radiologists with different experiences and specialties, and that sophisticated quantification can be attained by using commercially available proprietary software," they wrote.

The investigators used machine-learning software developed by Canon Medical Systems (CT Lung Parenchyma Analysis, prototype version 3) for texture analysis on thin-section chest CT for patients with COPD, interstitial lung disease, and other infectious diseases, such as pneumonia; they conducted this study to evaluate the software's ability to improve agreement among readers as well as their accuracy for identifying disease findings. The software uses CT image data to classify voxels into seven texture patterns: normal lung, ground-glass opacity, reticulation, emphysema, nodular lesion, consolidation, and honeycomb.

Ohno and colleagues used 28 cases to train the algorithm and 17 to validate it. The test case cohort used for the study included 89 cases, which produced 350 regions of interest (all exams were performed on either Aquilion One or Aquilion 64 systems, also Canon Medical Systems).

A consensus team of three radiologists of varying specialties evaluated these test cases, as did the machine-learning software; a group of pulmonologists and chest radiologists determined interobserver agreements with and without the help of the software among the three radiologist readers using Kappa statistics (with measures of 0.61-0.80 indicating substantial agreement and measures of 0.81-1.00 indicating excellent agreement).

The researchers found that the software improved agreement among the interpretations produced by the radiologist readers, as well as readers' accuracy for differentiating between various states and conditions reflected by the seven texture patterns.

Reader performance for CT with and without machine-learning software*
Measure	Software alone	Consensus readings without software	Consensus readings with software
Kappa value	0.79	0.81	0.91
Differentiation accuracy	82.3%	84.3%	94.9%

*All results statistically significant, at p < 0.0001.

The study is the first to show that artificial intelligence (AI) can assess thin-section CT findings with comparable accuracy to radiologists -- and thus may be a potential second-reader tool for evaluating patients with lung diseases, according to Ohno and colleagues.

"Our results show that machine learning-based CT texture analysis can improve interobserver agreement and overall accuracy of differentiation for evaluation of radiological findings by radiologists with different specialties in chest thin-section CT," the group wrote. "In addition, the software has almost equally good potential for improving evaluation of radiological findings as well as consensus for readings by radiologists."

Ohno's team did acknowledge that one of the study's limitations is that only one vendor's CT images were used. More research is needed, the group concluded.

"Capabilities of this software for evaluation of radiological findings, disease severity or therapeutic efficacy would need to be assessed in future studies to investigate and validate ... the true significance of this software for routine clinical practice," it wrote.