Artificial intelligence in foot and ankle pathology: Can large language models replace us?




Diagnosis, Treatment Outcome, Observer variation


Objective: Determine if large language models (LLMs) provide better or similar information compared to an expert trained in foot and ankle pathology in various aspects of daily practice (definition and treatment of pathology, general questions). Methods: Three experts and two artificial intelligent (AI) models, ChatGPT (GPT-4) and Google Bard, answered 15 specialty-related questions, divided equally among definitions, treatments, and general queries. After coding, responses were redistributed and evaluated by five additional experts, assessing aspects like clarity, factual accuracy, and patient usefulness. The Likert scale was used to score each question, enabling experts to gauge their agreement with the provided information. Results: Using the Likert scale, each question could score between 5 and 25 points, totaling 375 or 75 points for evaluations. Expert 2 led with 69.86%, followed by Expert 1 at 68.53%, ChatGPT at 64.80%, Expert 3 at 58.40%, and Google Bard at 54.93%. Comparing experts, significant differences emerged, especially with Google Bard. The rankings varied in specific sections like definitions and treatments, highlighting GPT-4’s variability across sections. The results emphasize the differences in performance among experts and AI models. Conclusion: Our findings indicate that GPT-4 often performed comparably to or even better than experts, particularly in definition and general question sections. However, both LLMs lagged notably in the treatment section. These results underscore the potential of LLMs as valuable tools in orthopedics but highlight their limitations, emphasizing the irreplaceable role of expert expertise in intricate medical contexts. Evidence Level: III, observational, analytics.




How to Cite

Segura, F. P., Segura, F. M., Porta, J., Heredia, N., Masquijo, I., Anain, F., … Segura, F. V. (2024). Artificial intelligence in foot and ankle pathology: Can large language models replace us?. Journal of the Foot & Ankle, 18(1), 52–58.