CT screening study tracks reader agreement in nodule follow-up

Feb 13, 2011

A study based on National Lung Screening Trial data found that agreement between readers on nodule growth and screening results at follow-up was acceptable, but it could be better, according to researchers from the University of Alabama and several other U.S. centers.

Reader agreement on follow-up recommendations was lower still, except in a group of patients later proved to have lung cancer, where it approached 100%.

"Our findings demonstrated that agreement on nodule growth was similar to agreement on interpretation of screening result on prevalence CT screening examinations," wrote Satinder Singh, MD, from University of Alabama Hospitals in Birmingham, AL, and colleagues in a study published online January 19, 2011, in Radiology.

Substantial subjectivity

CT is highly sensitive for the detection of small lung nodules, but inasmuch as most nodules found on screening are benign or indeterminate, surveillance is the usual method of follow-up, the group reported. But follow-up evaluation -- including changes in size, attenuation, and contour over time -- opens the door to substantial reader subjectivity and variability, they wrote.

And while reader variability in the interpretation of baseline CT has been well studied, variability in the assessment of nodule changes at follow-up has not, Singh and colleagues wrote.

"The primary purpose of this study was to assess reader variability in determining whether an abnormality detected at baseline screening CT has changed at subsequent screening examinations," they wrote. "Because the most clinically relevant result of interpretation is the subsequent diagnostic action to be taken, we also evaluated the variability in radiologists' recommendations for further evaluation of the abnormality."

In the study, 100 cases were randomly chosen from more than 15,000 lung CT exams performed at 10 U.S. centers between 2003 and 2005 as part of the National Lung Screening Trial.

All cases consisted of nodules 4 mm or larger at one-year follow-up that were considered by all readers to have been present at baseline, though nodules could have been smaller or larger than 4 mm at baseline, the authors noted. Nodules considered by the original reader to have changed over the one-year surveillance period were oversampled in the study to focus on nodules that grew.

The follow-up scans were performed using MDCT scanners with at least four detectors and a low-dose protocol, including 120-140 kVp, 20-60 effective mAs, and contiguous or overlapping reconstruction section thickness of 2.5 mm or less that matched the section thickness of the baseline scan. All measurements were acquired bidirectionally, which introduces somewhat greater variability into size measurements compared to emerging measurement methods.

Nine experienced radiologists, viewing the data on a PACS network, evaluated whether the nodule was present at baseline and recorded the bidimensional measurements and nodule characteristics, presence or absence of change, results of screening CT, and follow-up recommendations that were categorized as high-level follow-up, low-level follow-up, or no follow-up recommended.

Based on reviews during case selection, five of the 100 nodules seen at the follow-up scan were deemed not to have been present at baseline.
For 19 of the remaining 95 cases, at least one reader judged the nodule not to have been present at the baseline scan.
The mean nodule size was 7.0 mm ± 5.0 (range, 1.5 to 40 mm) at baseline and 7.8 mm ± 5.6 at follow-up.
For the 76 nodules that were unanimously considered to have been present at baseline, 21% to 47% (mean ± standard deviation, 30% ± 9) were deemed to have grown during the surveillance period.

Comparing CT scans at baseline to those at one-year follow-up, reader agreement on the presence or absence of nodule growth was moderate, with reader confidence expressed as a κ coefficient of 0.55, the authors wrote.

Reader agreement on the need for high-level follow-up was good, with a κ coefficient of 0.66.
Agreement on changes in a small subset of proven lung cancer cases was very high, with readers detecting the change on average 97% of the time.
Agreement was low for a change in margins and attenuation (κ = 0.27-0.31), but the κ-value in the recommendation of high- versus low-level follow-up was high (κ = 0.66).

A high level of reader agreement on the need for high-level follow-up was an encouraging sign that important findings were nearly universally recognized as such, the authors said. For example, agreement on nodule changes in a small subset of proven lung cancer cases found readers detecting the change 97% of the time, on average.

"A possible explanation for these findings is that readers were most focused on the binary decision of whether high-level follow-up was needed," the authors noted. "If high-level follow-up was deemed unnecessary, readers may have been less concerned about making a distinction between recommending low-level follow-up and no follow-up, knowing that another screening examination would be performed in one year even if no other diagnostic action is taken."

Overall, the findings showed that agreement on nodule growth over one year is similar to agreement on interpretation of initial screening results, Singh and colleagues wrote.

The moderate confidence values for reader agreement on nodule growth and CT screening result (κ = 0.55 and 0.51, respectively) were similar to those of a previous study, while the κ-values indicating confidence in changes in margin and attenuation were lower than previous reports.

This difference implies that readers found judging changes in nodule size to be more reliable than changes in CT attenuation or margins, the authors wrote.

"Therefore, most readers appear to have relied primarily on growth to judge the presence of change," they wrote. "Indeed, the κ-value for screening result ... was the same as that for growth of the nodule. The lower agreement on change in attenuation and/or margins likely prevented agreement on the screening result from being higher. It is reassuring that reader agreement for change among the small number of cancers was very high."

The National Lung Screening Trial didn't use computer-aided volumetric measurement methods "because they were not universally available at the different trial centers and lacked thorough validation and Food and Drug Administration approval at the time the trial was designed," the study team wrote.

Even so, there is recent evidence that volumetry introduces a number of sources of variability itself including the computer algorithm, scan pitch, and reconstruction algorithm used.

Even using the same scanner and volumetry technique, studies have shown significant differences in the volume of lung nodules scanned more than one minute apart, "suggesting that automated serial measurements may be affected by factors such as differences in volume averaging and attenuation of the lung surrounding the nodule," Singh and colleagues wrote.

As for study limitations, the follow-up evaluation used postprocessing techniques such as maximum intensity projections and multiplanar reconstruction that weren't used in the original screening trial and that may have reduced reader variability, the group wrote. In addition the retrospective nature of the study was "artificial" and may not have reflected how the readers would have performed in clinical practice.

In lung cancer screening with CT, a change in the size of noncalcified lung nodules appears to be the most important consideration in detecting nodule change and making follow-up recommendations, while reader agreement for these determinations "seems acceptable but could be improved," they wrote.

By Eric Barnes
AuntMinnie.com staff writer
February 14, 2011