Performance of ChatGPT and Gemini Compared with Emergency Physicians in NSTEMI Cases: A Prospective Cross-sectional Study

Mustafa Yorgancıoğlu; Ekim Saglam Gurmen

doi:10.34172/aim.35274

Arch Iran Med. 2025;28(12): 696-702.
doi: 10.34172/aim.35274

Abstract View: 332

PDF Download: 302

Original Article

Performance of ChatGPT and Gemini Compared with Emergency Physicians in NSTEMI Cases: A Prospective Cross-sectional Study

Mustafa Yorgancıoğlu ¹^* , Ekim Saglam Gurmen ²

¹ Emergency Department, Torbalı State Hospital, İzmir, Turkey
² Emergency Department, School of Medicine, Manisa Celal Bayar University, Manisa, Turkey

*Corresponding Author: Mustafa Yorgancioglu, Email: mustafaayorgancioglu@gmail.com

Abstract

Background: Diagnosing non-ST elevation myocardial infarction (NSTEMI) in busy emergency departments is challenging. Artificial intelligence (AI) systems, particularly large language models (LLMs), offer potential as clinical decision support tools. This study aimed to evaluate the reliability of ChatGPT and Gemini in NSTEMI cases by comparing their responses to multiple-choice questions with those of emergency physicians.

Methods: This prospective, cross-sectional study was conducted via an online survey among 1,106 emergency physicians in Turkey. The survey included ten NSTEMI-related multiple-choice questions based on the 2023 European Society of Cardiology guidelines. The same questions were presented to ChatGPT 4.0 and Gemini 2.5, queried using identical standardized prompts (temperature=0, no web access) on April 20, 2025. Statistical analyses were performed using SPSS 26.0.

Results: AI models significantly outperformed physicians, correctly answering nine of ten questions versus the physicians’ mean of 7.62±1.32 (P<0.001). Effect sizes indicated a very large difference for less experienced physicians and a moderate difference for specialists. Performance improved with experience, yet AI exceeded even the most experienced physicians. Participants from training and research hospitals scored higher than those from state hospitals.

Conclusion: ChatGPT and Gemini demonstrated superior performance over emergency physicians in NSTEMI clinical questions, highlighting AI’s potential to enhance medical education, clinical decision support, and patient care. These findings, however, are limited by the non-proctored online setting and absence of real clinical context. Future research should focus on optimizing AI-clinician collaboration for safe and effective integration.

Keywords: Artificial intelligence, Clinical decision support, Emergency medicine, Large language models, Non-ST elevation myocardial infarction

Cite this article as: Yorgancioglu M, Gurmen ES. Performance of ChatGPT and Gemini Compared with Emergency Physicians in NSTEMI Cases: A Prospective Cross-sectional Study. Arch Iran Med. 2025;28(12):696-702. doi: 10.34172/aim.35274