Abstract
Background: Diagnosing non-ST elevation myocardial infarction (NSTEMI) in busy emergency departments is challenging. Artificial intelligence (AI) systems, particularly large language models (LLMs), offer potential as clinical decision support tools. This study aimed to evaluate the reliability of ChatGPT and Gemini in NSTEMI cases by comparing their responses to multiple-choice questions with those of emergency physicians.
Methods: This prospective, cross-sectional study was conducted via an online survey among 1,106 emergency physicians in Turkey. The survey included ten NSTEMI-related multiple-choice questions based on the 2023 European Society of Cardiology guidelines. The same questions were presented to ChatGPT 4.0 and Gemini 2.5, queried using identical standardized prompts (temperature=0, no web access) on April 20, 2025. Statistical analyses were performed using SPSS 26.0.
Results: AI models significantly outperformed physicians, correctly answering nine of ten questions versus the physicians’ mean of 7.62±1.32 (P<0.001). Effect sizes indicated a very large difference for less experienced physicians and a moderate difference for specialists. Performance improved with experience, yet AI exceeded even the most experienced physicians. Participants from training and research hospitals scored higher than those from state hospitals.
Conclusion: ChatGPT and Gemini demonstrated superior performance over emergency physicians in NSTEMI clinical questions, highlighting AI’s potential to enhance medical education, clinical decision support, and patient care. These findings, however, are limited by the non-proctored online setting and absence of real clinical context. Future research should focus on optimizing AI-clinician collaboration for safe and effective integration.