Recommendations address AI bias, data transparency

Dec 18, 2024

New recommendations seek to make medical AI safe and effective for everyone from clinicians to patients by tackling the issues of bias and data transparency.

The recommendations, published December 18 in The Lancet Digital Health and NEJM AI, outline factors that can contribute to AI bias. Xiao Liu, PhD, from the University of Birmingham in England, and colleagues led the effort, which considered consensus data from over 350 experts in 58 countries.

“To create lasting change in health equity, we must focus on fixing the source, not just the reflection,” Liu said in a prepared statement.

Bias and data transparency have been two focal points for radiologists using AI. AI advocates say that for the technology to further advance into the clinic, these issues among others need to be addressed. They also say that current datasets do not adequately represent diverse populations. People who are in minority groups are likely to be under-represented in datasets, so they may be disproportionately affected by AI bias.

One international initiative that seeks to help with this is the STANDING Together (STANdards for data Diversity, INclusivity and Generalisability) initiative. The initiative developed recommendations for AI healthcare technologies to be supported by representative data.

“The recommendations aim to encourage transparency around ‘who’ is represented in the data, ‘how’ people are represented, and how health data is used,” according to STANDING Together’s website.

The initiative is being led by researchers at University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. Collaborators from over 30 institutions around the world have worked to develop the recommendations.

The recommendations call for the following:

Encourage medical AI to be developed using appropriate healthcare datasets that represent everyone in society. These include minority and underserved groups.
Help anyone who publishes healthcare datasets to identify any biases or limitations in the data.
Allow researchers developing medical AI technologies to find out whether a dataset is suitable for their purposes.
Define how AI technologies should be tested to identify if they are biased.

The recommendation authors also provided guidance on identifying patients who may be harmed when medical AI systems are used. They wrote that dataset documentation should include data on relevant attributes related to individual patients. They added that patient groups at risk of disparate health outcomes should be highlighted.

“If including these data may place individuals at risk of identification or endanger them, these data should instead be provided at aggregate level,” they wrote. “If data on relevant attributes are missing, reasons for this should be stated.”

Liu compared data to a mirror, saying it reflects reality.

“And when distorted, data can magnify societal biases,” Liu said in a prepared statement. “But trying to fix the data to fix the problem is like wiping the mirror to remove a stain on your shirt.”

The authors wrote that they hope these recommendations will raise awareness that “no dataset is free of limitations. This makes transparent communication of data limitations valuable, they added.

“We hope that adoption of the… recommendations by stakeholders across the AI health technology lifecycle will enable everyone in society to benefit from technologies which are safe and effective,” the authors wrote.

The full recommendations can be found here.