ChatGPT-4 performs well on practice questions for RT exam

ChatGPT-4 performed well on practice questions designed to be similar to those used on radiologic technologist (RT) certification exams in a study published August 16 in Academic Radiology.

Researchers led by Yousif Al-Naser from McMaster University in Hamilton, Ontario, Canada found that while the chatbot achieved an overall score of about 80%, its accuracy in answering image-based questions was low. It performed well on text-based questions.

“By identifying where ChatGPT excels and where it falls short, educators and technologists can better strategize how to integrate AI into educational frameworks effectively,” Al-Naser and colleagues wrote.

Radiologists and technologists continue to test generative AI’s merit when it comes to medical board exams. ChatGPT has shown mixed results when taking different exams from various radiologic societies. A 2023 study for example found that ChatGPT-4 performed well on the image-independent American College of Radiology (ACR) Diagnostic In-Training Exam (ACR DXIT) practice questions. However, a 2024 study found that the chatbot performed poorly on the actual ACR DXIT exam.

In their study, Al-Naser and colleagues used a dataset of 200 multiple-choice questions marketed by education firm BoardVitals as practice questions for the American Registry of Radiologic Technologists (ARRT) Certification Exam. (Note: the ARRT is not affiliated with BoardVitals and a representative told AuntMinnie.com that the organization does not share any exam items or practice questions for research or commercial purposes).

The researchers used a dataset of 200 practice questions for the exam from BoardVitals. They fed each question to ChatGPT-4 15 times, resulting in 3,000 observations to account for response variability.

ChatGPT-4 achieved an overall performance of 80.56%, with higher accuracy on text-based questions (86.3%) compared with image-based questions (45.6%). The chatbot also showed a response time of 18.01 seconds for image-based questions, compared to the 13.27 seconds needed for text-based questions.

Additionally, ChatGPT’s performance varied by domain; while it achieved accuracies of 72.6% for safety and 70.6% for image production, it also scored 67.3% for patient care and 53.4% for procedures. Finally, the large language model achieved the highest performance on questions deemed to be easy (78.5%).

The study authors called for such AI models to be further developed, particularly for image processing and interpretation, to increase their use in educational settings. They added that by analyzing ChatGPT’s strengths and weaknesses, the model’s use in education could be improved and help improve outcomes for students in radiologic technology.

“These tools can provide students with interactive, AI-driven quizzes that offer immediate feedback and explanations, improving their understanding of radiographic imaging principles,” the authors wrote. “ChatGPT has the potential to enhance accessibility by providing on-demand content, which is beneficial for supplementary learning outside of classroom environments.”

The full results can be found here.

Page 1 of 384
Next Page