Evaluation paradigms for conversational AI in healthcare:Systematic review

Wei-zhen LIAO; You-li HAN; Cheng-yu MA

Return

Evaluation paradigms for conversational AI in healthcare:Systematic review

VernacularTitle:医疗健康领域中对话式人工智能的评估范式:系统综述
Author: Wei-zhen LIAO ¹ ; You-li HAN ¹ ; Cheng-yu MA ¹
Author Information

1. 首都医科大学公共卫生学院北京 100069
Publication Type:Journal Article
Keywords: Large language model; Conversational AI; Evaluation index; Evaluation system; Healthcare
From: Chinese Journal of Health Policy 2025;18(7):78-86
CountryChina
Language:Chinese
Abstract: Objective:This study aims to systematically review the current evaluation paradigms of conversational AI in healthcare and provide insights to facilitate the development of a comprehensive evaluation framework and methodological advancements in this field.Methods:A systematic review was conducted by searching the PubMed and Web of Science databases to analyze the existing evaluation paradigms of healthcare conversational AI,including evaluation subjects,assessment metrics,and evaluation methodologies.Results:A total of 60 studies were included in this review.The findings indicate that most evaluation subjects focus on general-purpose large language models.The assessment metrics cover five key dimensions:technical performance,information quality,clinical effectiveness,user experience,and ethics and safety.However,there were significant differences in the evaluation criteria used in existing studies.There were also issues such as a low degree of alignment between the evaluation questions and the application scenarios,as well as a lack of diversity in the roles of the evaluators.Conclusions:The current evaluation framework for healthcare conversational AI remains underdeveloped.Future improvements should focus on broadening model coverage,enhancing the comprehensiveness of evaluation indicators,standardizing evaluation methods,improving the operationalizability of test content,and expanding the scalability of evaluation languages.