CMIMI: ChatGPT classifies clinically significant breast pain

BOSTON -- ChatGPT shows early promise in classifying clinically significant breast pain, according to research presented at the Conference on Machine Intelligence in Medical Imaging (CMIMI).

In a poster presentation at the Society for Imaging Informatics in Medicine (SIIM)-hosted meeting, a team led by Hana Haver, MD, from Brigham and Women’s Hospital in Boston shared research that showed how the large language model agreed with breast imagers in nearly three out of four breast pain symptom cases. ChatGPT also correctly identified most clinically significant breast symptoms.

“For women who are super benign, we can save them a trip in and if we have extra appointments, we could [use] them for patient evaluations,” Haver told AuntMinnie.com.

The American College of Radiology’s (ACR’s) appropriateness criteria defines clinically insignificant breast pain as nonfocal, diffuse, or cyclical pain. This pain is also not associated with malignancy and does not require imaging. Diagnostic imaging typically yields no suspicious findings in women with clinically insignificant breast pain.

Haver and colleagues highlighted that identifying women with clinically insignificant pain can help with triaging these women so they avoid unnecessary diagnostic imaging. They tested ChatGPT-4’s abilities in automating the classification of common breast pain symptoms based on their clinical significance.

The study included 150 breast pain symptoms representing clinical variants described in the ACR appropriateness criteria. Also, 30 breast pain symptoms (15 clinically significant, 15 clinically insignificant) included non-pain-related clinically significant symptoms, such as palpable lumps and pathologic nipple discharge.

The researchers used a zero-shot prompt for ChatGPT-4 to characterize breast symptoms as clinically significant or insignificant, according to the criteria. Three radiologists served as ground-truth analysts.

GPT-4 assigned the appropriate clinical significance, in agreement with breast imagers, in 74.7% of breast pain symptoms. It also correctly identified 89.1% of clinically significant symptoms. Finally, among women with clinically insignificant symptoms, GPT-4 correctly identified 64%.

With this in mind, diagnostic imaging appointments should be prioritized for women with clinically significant pain, the researchers noted. They also highlighted that while GPT-4 in its current state is limited in this area, large language models could soon help identify appropriate imaging needs based on ACR recommendations.

“While it looks like a lower number were correct, it in fact was more conservative and saying more of those [women] should come in for an evaluation,” Haver told AuntMinnie.com. “In the conversation of breast cancer, it’s a completely reasonable thing to do.”

Check out AuntMinnie.com's coverage of CMIMI 2024 here.

Page 1 of 1