ChatGPT shows potential for assisting in bone tumor diagnosis

Jan 23, 2024

ChatGPT shows potential in helping radiologists to identify malignant bone tumors based on CT imaging findings in radiology reports, according to a study published January 22 in the Journal of Bone Oncology.

A team led by Fan Yang, MD, of Capital Medical University in Beijing reported that a few-shot trained ChatGPT model (that is, a model trained to make accurate predictions with only a small number of examples) showed 87% accuracy and 99% sensitivity for flagging malignant bone tumors.

"Our findings highlight the potential of ChatGPT in the diagnosis of benign and malignant bone tumors, offering advantages like enhanced efficiency and a reduction in missed diagnoses," the group wrote.

Bone lesions are commonly identified on CT, and while some are in fact malignant, most present as benign abnormalities, Yang and colleagues explained. Ambiguous diagnostic cases are challenging for ChatGPT, as overlapping benign and malignant imaging features can complicate matters, which is why "collaboration between physicians and ChatGPT is crucial in real-world settings," they noted.

To test the use of ChatGPT to identify malignant bone lesions, the team conducted a study that included 1,366 benign and malignant bone tumor-related imaging reports which were interpreted by 25 clinicians. The reports were entered into the ChatGPT model, which was trained by a few-shot learning method. The team then compared the physician results to the AI model results (both before and after few-shot training) and analyzed any misdiagnosed cases for diagnostic errors and possible causes for misdiagnosis.

Few-shot learning improved ChatGPT's results, the researchers found.

ChatGPT performance for diagnosing bone tumors
Measure	Before few-shot learning	After few-shot learning
Accuracy	73%	87%
Sensitivity	95%	99%
Specificity	58%	73%

The team also conducted an experiment analyzing the influence of the radiologists' reporting style on ChatGPT, finding that it had a higher sensitivity when interpreting reports written by experienced radiologists. The group also found the following:

ChatGPT misdiagnosed 56 benign cases as malignant. Of these, 35 benign lesions were misidentified as metastatic tumors or osteosarcomas.
It misdiagnosed 23 osteosarcoma cases as osteomyelitis.
The algorithm misdiagnosed 8 cases of chondrosarcoma as fibrous dysplasia or aneurysmal bone cyst.
ChatGPT also misdiagnosed four cases of spinal chordoma and spinal tuberculosis.

The study suggests that ChatGPT shows promise in the diagnosis of benign and malignant bone tumors, but collaboration with radiologist readers is necessary, according to the authors.

"[Our findings underscore] the necessity of collaborative interactions between physicians and ChatGPT in practical settings … [and] lays the groundwork for future AI advancements in medicine," they concluded. "Additionally, [it shows] the benefits of few-shot learning in fine-tuning ChatGPT applications in specialized fields."

The complete study can be found here.