CT databases aimed at lung imaging research

Aug 7, 2008

The lack of quality controlled imaging databases has complicated lung cancer research, but help is on the way.

CT images of the lungs, used for evaluating lung cancer detection by radiologists as well as computer-aided detection (CAD) schemes, have always been something of a moving target due to a lack of histologic proof in most cases. Because most lung nodules aren't biopsied or surgically resected, findings must be backed up by the opinions of expert readers, whose views are in perennially short supply and often conflict.

Enter the National Cancer Institute (NCI) in Bethesda, MD, a branch of the National Institutes of Health that has been working with academic centers and a few industry and clinical partners to spur progress by researchers, especially CAD developers. A multiyear, multicenter project funded by the National Cancer Institute is delivering new lung imaging databases on which systems can be tested.

First, the Lung Imaging Database Consortium (LIDC), funded by the NCI with the cooperation of five major academic centers, is building a database of lung imaging cases that have been reviewed by several thoracic radiologists. The growing collection of images is available to the public for imaging research on the National Cancer Imaging Archive (NCIA) Web site.

The NCI, in cooperation with researchers from institutions including the University of California, Los Angles (UCLA); Cornell University; the University of Chicago; the University of Michigan; and the University of Iowa, developed a unique multicenter data collection process and communication system to analyze and annotate a database of CT lung images. More than 400 cases are expected to be online by the end of the year.

In addition, a second project known as the Imaging Database Resources Initiative (IDRI) was created as an extension of the LIDC. IDRI includes the participation of vendors as well as academic centers, representing a "public-private partnership through the Foundation for the National Institutes of Health," said Michael McNitt-Gray, Ph.D., a participating researcher from UCLA who spoke at the 2008 International Symposium on Multidetector-Row CT.

The list of IDRI participants also includes two leading cancer treatment centers, Memorial Sloan-Kettering Cancer Center in New York City and the University of Texas M. D. Anderson Cancer Center in Houston, said McNitt-Gray, who is a professor of radiological sciences at UCLA's David Geffen School of Medicine.

The LIDC will make it easer for CAD developers to see how well their algorithms perform in carefully analyzed lung data.

"We're not building CAD systems, we're building resources for CAD algorithm developers," McNitt-Gray said. The LIDC will enable developers to correlate and compare the performance of their methods for "detection and classification of lung nodules with spatial, temporal, and pathological ground truth for each case," he said.

"We really struggled with the issue of truth to provide information about the presence or absence of nodules and the spatial extent of those nodules," McNitt-Gray said. "We thought it was going to be fairly obvious, and it turned out not to be."

The problem is variability of findings, he said. "We had some very expert thoracic radiologists read and annotate these CTs. Our studies and those of other groups indicated significant variability between expert readers," he said. The researchers also experimented among themselves and came to the same conclusion.

To optimize data analysis in the face of significant variability, the group designed a two-phase data collection process that would allow radiologists working at different centers to review and annotate each CT image series via the Internet.

"We allowed four readers to review and annotate each case," McNitt-Gray said. "We did not want a truth panel, and we did not force a consensus because we wanted to capture that variability between readers ... So we devised ways to capture that variability. Spatial truth: Where is the nodule? What are the boundaries?"

In the first or blinded phase, each radiologist reviewed CT series independently and marked the nodules. Then, the results from all four were compiled and presented to each radiologist for a second read.

This allowed each radiologist to review his own annotations along with those of the other radiologists. Every radiologist could see the markings of the other radiologists and either agree or disagree that it was a true positive. (The marks were color-coded so the radiologists could distinguish their own markings from those of other readers.)

2008 08 07 15 16 16 735 Mcnitt lidcnodule

A lung nodule unmarked (top) and marked (bottom) from the LIDC database. All images courtesy of the National Cancer Institute.

2008 08 07 15 16 14 279 Mcnitt lid Cnodulemarked

An XML-based message system, which was implanted across the entire LIDC database, was used to communicate the results of each review.

"Only for nodules greater than 3 mm, we asked them to fill out a description: subtlety, shape, and based only on the imaging data, the likelihood of malignancy," McNitt-Gray said. They also were asked to outline the borders of the larger nodules. For smaller lesions, the radiologists marked only the centroid, without providing a description. Pathology information also is available for each case.

RIDER: Follow-up of proven cancers

On the same NCIA Web site is the Reference Image Database to Evaluate Response (RIDER), McNitt-Gray said. "These are known cancer patients having serial CT exams, and there are no annotations," he said. "It is primarily for [CAD] algorithms being used to assess change."

2008 08 07 15 16 13 635 Mcnitt riderprerxmarked

From the RIDER database, a lung cancer case is shown pretreatment (top) and post-treatment (bottom).

2008 08 07 15 16 13 903 Mcnitt riderpostrxmarked

The RIDER database includes "coffee break" scans -- cases for which the patients are scanned twice within a few minutes "to look for variability," he said. Selected cases from this indexed collection (in full DICOM format) can be retrieved and downloaded from the archive using a variety of DICOM-indexed queries.

A number of dynamic contrast-enhanced MRI cases also are coming online, along with PET/CT studies, and phantom studies, all by the end of the year, according to McNitt-Gray. Farther down the road, the completed archive will contain a full range of annotated cases.

At press time, 84 cases were accessible in the public LIDC archive, said NCI spokesman Dr. Carl Jaffe, branch chief of the Cancer Imaging Program. From the industry-funded IDRI partnership, an additional 400 annotated cases of patients with nodules will be available by the end of 2008. According to Jaffe, the RIDER database currently has 306 cases in the publicly available archive and 185 more are in process for release in the near future.

By the end of 2009, "the entire LIDC and RIDER databases will be publicly available, with over 1,000 chest CTs, 300 chest x-rays with CT correlation, and further extension of the RIDER data," McNitt-Gray said. Together the LIDC, IDRI, and RIDER databases will enhance research into computer-assisted lung imaging applications.

By Eric Barnes
AuntMinnie.com staff writer
August 8, 2008

Hybrid lung segmentation software boosts performance, February 13, 2008

Experience may make a difference with lung CAD, February 4, 2008

Radiologists more likely to reject certain true-positive CAD findings, January 22, 2008

CAD offers value in detecting lung nodules with CT, December 17, 2007

CAD may boost radiologists' ability to characterize lung nodules, November 25, 2007