Can AI help detect congenital heart disease on fetal ultrasound?

May 13, 2021

2021 05 13 17 01 5952 2021 05 13 Ai Prenatal Us 20210513171406

An artificial intelligence (AI) model can significantly improve the detection of congenital heart disease (CHD) on fetal screening ultrasound exams, according to researchers from the University of California, San Francisco (UCSF).

UCSF researchers utilized clinical guidelines for detecting congenital heart disease as the concept for developing an ensemble of neural networks to analyze fetal ultrasound surveys. In testing on internal and external datasets encompassing over 4,000 exams, their algorithm yielded extremely high sensitivity and specificity for identifying CHD.

"Combining clinical insights with an ensemble of neural networks can facilitate data-efficient strategies for improving clinically relevant use cases, even for rare diseases," said Dr. Rima Arnaout in a presentation at the recent Nvidia GPU Technology Conference. "And we can do this in a way that takes advantage of existing clinical guidelines instead of competing with them."

Although fetal ultrasound can, in theory, detect over 90% of congenital heart disease cases, fetal diagnosis rates of CHD worldwide can be as low as 30%-50%, according to Arnaout.

"That's a surprising gap between the possible and the real world and it's not just explained by women not being able to get ultrasound," Arnaout said.

To help close this gap and improve the diagnosis of CHD in the community, the researchers trained neural networks to analyze the five axial screening views of the heart that have been specified in clinical guidelines as being able to detect CHD: 3-vessel trachea, 3-vessel view, apical-5-chamber, apical-4-chamber, and abdomen.

"We did this because we hypothesized it would allow us to find diagnostic signals for a rare disease using a relatively small dataset," she said.

After training a convolutional neural network (CNN) to distinguish the five cardiac views of interest, the researchers then developed binary diagnostic classifiers to determine if each view was normal or abnormal. Next, these per-view diagnoses were used to create a composite diagnostic score as to whether the heart was normal or abnormal overall.

In addition, CNNs were used to calculate standard fetal cardiac measurements with the ultimate goal of including those in the composite score, she said. A neural network architecture based on ResNet was used for the classifiers, while a U-Net was utilized for segmentation.

The algorithms were trained using 1,326 retrospectively collected fetal echocardiograms and fetal screening ultrasounds from UCSF, consisting of approximately 100,000 images in total. The exams were collected at screening age from scanners from multiple vendors. Cardiac diagnoses were validated by expert overreads and if available, by postnatal diagnosis.

The algorithm produced 96% sensitivity and 92% specificity for determining if the specified view was present in the exam. The researchers then tested the model on 4,531 studies from four different test sets consisting of over 4.5 million images.

On a UCSF testing set of 4,108 fetal ultrasound surveys with a 0.9% rate of congenital heart disease prevalence, the model achieved 95% sensitivity, 96% specificity, 100% negative predictive value, and an area under the curve (AUC) of 0.99 for detecting CHD.

In testing on an enriched external set of 423 fetal echocardiograms -- 92% that showed CHD -- from Boston Children's Hospital, the algorithm demonstrated robust and generalizable performance, including 89% sensitivity, 92% specificity, 99% negative predictive value, and an AUC of 0.89.

Example test image shown per view (top row), with corresponding saliency map (unlabeled, second row; labeled, third row) showing that the pixels most important to the model in predicting the view highlighted anatomic structures important to each view. Fourth row: gradient-weighted class activation map (Grad-CAM) provides a heatmap of regions of the image most important to the model in predicting the view. Grad-CAMs were also highly specific for structures distinguishing each view; the confluence of the aortic and ductal arches compared to the aortic cross-section distinguishing 3VT from 3VV, for example, and the left ventricular outflow tract versus right heart distinguishing A5C from A4C. SM = saliency map; 3VT = 3-vessel trachea; 3VV = 3-vessel view; A5C = apical 5-chamber; A4C = apical 4-chamber; ABDO = abdomen; DA = ductal arch; AA = aortic arch; SVC = superior vena cava; PA = pulmonary artery; TV = tricuspid valve; AV = aortic valve; MV = mitral valve; IVS = interventricular septum; IAS = interatrial septum/foramen ovale. Images and caption courtesy of Dr. Rima Arnaout.

Notably, the algorithm also performed well in testing on lower-quality images, producing 95% sensitivity and 39% specificity for distinguishing between normal patients and those with CHD, Arnaout said. This indicates that the algorithm can still make use of suboptimal images in fetal surveys to detect complex CHD, albeit with lower specificity, according to the researchers.