NIH opens massive x-ray database to propel AI research

2017 09 28 16 05 8299 Nih Database X Ray 20170928164634

Optimizing artificial intelligence (AI) algorithms for medical applications demands immense stores of imaging data. Researchers may now begin sating that need thanks to an enormous database of chest x-rays recently made available by the U.S. National Institutes of Health (NIH).

Unlike most other large-scale datasets, this collection of more than 100,000 images has been released as a service to the public and the scientific community, with no holds barred, Dr. Ronald Summers, PhD, told AuntMinnie.com. Summers is a senior investigator and staff radiologist at the NIH Imaging Biomarkers and Computer-Aided Diagnosis Laboratory.

The NIH has had an ongoing commitment to data sharing and has finally been able to prepare a dataset of this size, Summers said. This is a huge step toward integrating AI mechanisms into clinical practice.

"The deep-learning methods that are all the rage now in radiology and in many other fields are very data hungry," he said. "And I think collections such as this one will be likely to enable progress toward the goal of more accurate AI for the task of computer-aided diagnosis on chest radiographs."

Going public

Accurately and efficiently reading and diagnosing chest x-rays is a complex procedure that radiologists have been fine-tuning for generations. Various deep-learning algorithms have been developed to improve the process, but the limited supply of imaging data has been a longstanding stumbling block for clinically meaningful application, according to Summers.

Dr. Ronald Summers, PhD, from the NIH Clinical Center.Dr. Ronald Summers, PhD, from the NIH Clinical Center.

As a means of clearing that hurdle, Summers and colleagues extracted approximately 60% of all of the then-current frontal chest x-rays performed at the NIH Clinical Center in Bethesda, MD, with the intention of providing them to the public.

"We wanted to release this dataset, but we had to make sure that we had properly taken care of any privacy issues about the dataset," he said. "It took us a while to appreciate what needed to be done for us to become comfortable with releasing it to the public."

Every single image needed to be manually inspected to confirm there was nothing that might make it identifiable, such as dates and personal information, Summers explained. In the end, all of the images were examined twice, with contributions from just about everyone in his lab.

The dataset consists of more than 112,000 frontal-view chest x-ray images of nearly 31,000 unique patients. There are also accompanying labels noting the chest pathology of each image as determined by natural language processing software that boasts an accuracy greater than 90%, according to the researchers.

The full database can be downloaded as of September 27 this year.

"We now have a means of releasing datasets more easily and we have experience with anonymizing large collections," he said. "So I'm hopeful we'll be able to release more data. Time will tell whether that will be more feasible."

More data, better performance

Other, albeit much smaller, public collections of imaging data had been available before these NIH chest x-ray images were released. Summers noted that whenever large datasets are made public, a great many researchers immediately begin using the data to try to improve detection and diagnosis performance.

The pattern became obvious to him after his lab released a collection of CT scans of lymph nodes in 2015 and a separate series of CT scans of the pancreas in 2016 through the Cancer Imaging Archive. The new chest x-ray dataset is similarly attracting considerable attention, with an average of 100 downloads per day in the first week, Summers said.

"I think it's well-understood that the more data is available for AI researchers, the greater their ability to create high-performing AI systems to do things like computer-aided detection and diagnosis," he said.

Each new dataset presented to existing AI algorithms enlarges their core knowledge and polishes their ability to detect and diagnose. And the quality of the images makes a difference, too, Summers said.

"I think the broader and deeper the datasets are in terms of breadth of imaging modalities and body parts and applications ... the more you'll see researchers pouring in to try to tackle problems relating to that application or imaging modality or disease," he said.

For example, Summers' group unveiled a project at RSNA 2016 in which a deep-learning system, consisting of convolutional neural networks and recurrent neural networks, was trained to provide a more human-like diagnosis by examining not just chest x-rays, but also report summaries annotated using Medical Subject Headings (MeSH) standardized terms. By continuing work in this area using the recently released dataset, the group has tapped further ideas regarding how to improve the method's performance, according to Summers.

More to come

The NIH has slated yet another enormous database of CT scans for release to the public in the coming months. If anything, this news may be a harbinger of the growing use of AI algorithms for detecting and diagnosing disease.

Summers is particularly interested in the concept of fully automated imaging of abdominal CT, as expressed in his talk at the International Society for Computed Tomography symposium this summer. The abdomen has been relatively understudied compared to other regions of the body, and the plan is to continue seeking out the best ways to have computers automatically find and diagnose diseases in that part of the body, he told AuntMinnie.com.

Be it mining for common disease patterns, analyzing disease correlations, automating imaging reports, or some other clinically meaningful application, the future of uniting AI and healthcare is ever-approaching, according to Summers.

"AI research is accelerating at an exponential rate," he said. "The deep-learning technology is so readily available that we're seeing more and varied researchers entering the field, and I anticipate a great acceleration of progress over the next three to five years."

Ultimately, Summers hopes that these advances will contribute to one of the primary missions of the NIH, which is to reduce pain and suffering from disease.

"I think it's highly advantageous to get the datasets out, because the sooner they're out there, the sooner it's likely that patients will be able to benefit from these advanced image analysis tools," he said. "That's what it's all about -- improving patient care."

Page 1 of 371
Next Page