Vision-language model exhibits bias reading chest x-rays

Wednesday, December 4 | 9:10 a.m.-9:20 a.m. |W2-STCE1-2 | Learning Center Theater 1

Clinical deployment of vision-language foundation models with inherent demographic biases could exacerbate healthcare disparities, especially for marginalized groups, according to a study to be delivered in this session.

Presenter Domenico Mastrodicasa, MD, of the University of Washington, and colleagues, tested the fairness of a model called CheXzero, developed in 2022. They input chest x-rays across five radiology datasets (MIMIC, CheXpert, NIH, PadChest, and VinDr). Specifically, they tested for the presence of demographic biases that could impact historically marginalized groups.

The datasets included 858,804 chest x-rays with pathology labels and demographic data from 230,570 patients. The CheXpert, PadChest, and VinDr datasets included radiologist labels, and the CheXpert test set (n = 666) and VinDr (n = 5,323) test set included external annotations from radiologists (n = 3).

For fairness evaluation, the researchers processed chest x-rays through the model using specific text prompts for a wide range of pathologies, for instance, “enlarged cardiomediastinum.” The radiologists’ labels were used as benchmarks to evaluate diagnostic performance and fairness. Additionally, in a subset (n = 480) of the MIMIC dataset, the model’s prediction of demographic attributes (sex, age, race) was compared with those of three additional board-certified radiologists. They then compared false-negative and false-positive rates across demographic groups to quantify underdiagnosis disparities and identify potential biases in the model.

According to the results, the model achieved high diagnostic accuracy but showed significant demographic biases, particularly underdiagnosing females, older adults, and racial minorities. Moreover, the compounded biases in intersectional groups (i.e., Black females), highlight the need for targeted interventions before deploying vision-language models in clinical settings, the researchers wrote.

Grab a coffee and sit in on this session for all the details.