SIIM: Imaging AI developers need meaningful training data

Jun 14, 2023

2021 04 07 22 09 9651 Artificial Intelligence Ai Face 400

AUSTIN, TX -- Imaging machine-learning algorithms can produce more meaningful results if they are trained on data that don't rely on expert interpretations, according to a talk at the Society for Imaging Informatics in Medicine (SIIM) annual meeting.

In his keynote presentation on June 14 during the opening session of SIIM 2023, Ziad Obermeyer, MD, from the University of California, Berkeley highlighted the pitfalls of the current approaches to developing imaging AI algorithms and discussed what's needed to gather higher-quality training data.

"Machine learning should do better than humans, not just copy them, and that's the promise [of the technology]," Obermeyer said. "To do that, we need to do a lot of work to find better labels and start with what's the right question to ask instead of what data is available."

He acknowledged, however, that training data based on patient outcomes can be difficult to acquire for use in diverse populations.

Radiologists are continuing to explore the potential uses of AI in clinical practice, whether it be as an adjunct imaging reader or as a predictor of treatment success. However, the usual steps for developing machine-learning algorithms include relying on expert interpretations, or labeling, of images as the ground truth.

Obermeyer said two problems arise from this approach, including human bias being introduced and overall medical knowledge being incomplete.

Going beyond standard measures

While algorithms developed from imaging data can help make diagnoses, Obermeyer said many don't take patient-reported measures into account. This can further exacerbate disparities experienced among different patient populations.

Obermeyer previously co-led research that explored the effectiveness of an algorithm that incorporated patient-reported pain with knee x-rays. The researchers found that while standard measures of pain severity graded by radiologists accounted for 9% of racial disparities in pain, algorithmic predictions accounted for 43% of disparities.

The team wrote that this suggests that pain experienced by underserved patients comes from factors within the knee that are not reflected in standard measures employed by radiologists.

Difficulty in gathering data

The data gathering process can be arduous, with several steps needed to take before finally acquiring usable data to develop algorithms. Obermeyer shared this challenge with a study he is co-leading with electrocardiogram data, with data acquisition taking several years due to regulatory and data extraction challenges.

He highlighted that this experience is an example of data access friction, where medical data used to develop algorithms or contribute to studies are hard to come by.

"These data have enormous value to society," Obermeyer said. "I think of health data as a public good. Of course we need to respect privacy, agency, and people's wishes about their data, but we also need to recognize that these data are an immense public good that I think are being dramatically underutilized today."

Such barriers to data access may lead to problems in currently used algorithms, such as racial bias. Obermeyer said there are "very few" studies assessing this problem. He also called the benchmark performances of large-language models such as ChatGPT "laughable."

"What's the first rule of machine learning? Don't evaluate your model on the same data you trained your model on," he said. "What are the odds that people building these models have not trained their models on medical board questions? Zero. There's absolutely no chance that there's not contamination. Those evaluation metrics are not meaningful."

Obermeyer also said that this can also create an unsafe regulatory environment where "we don't know how these things are performing in the real world, because we just can't get the data."

What's needed?

Obermeyer said that in order to tackle these challenges, algorithm accountability and audits are needed.

Obermeyer in 2020 co-founded an algorithm audit service called Dandelion Health. Here, researchers and company executives can upload algorithms with legal and intellectual property protections into a neutral computing environment. From there, they can choose an evaluation option and the site will run the data on a separate dataset and provide a performance report that takes data quality, as well as factors related to race, geography, and gender, into account.

Also, Obermeyer in 2019 co-founded Nightingale Open Science, a nonprofit research platform that connects researchers with medical data. Its goal is to facilitate collaboration between computer-science researchers, clinicians, and economists in developing algorithms.

The platform works with high-dimensional data such as from medical imaging and links it to actual patient outcomes as the ground truth, enabling algorithms to learn from "nature" and not just from humans, according to Obermeyer. Imaging data can also be easily de-identified. Obermeyer said that over the past year, the platform has been used by about 600 users from across 40 countries, "most of them with no access to their own data."

"Data platforms that connect people like you [radiologist audience members] who know what you're doing to the data you need to do this work are going to be very important," he said.