Proficiency evaluation of large language models in medical laboratory technology education
10.3760/cma.j.cn116021-20250516-02108
- VernacularTitle:大语言模型在医学检验技术教育中的能力评估
- Author:
Yang WANG
1
;
Jiahao WU
1
;
Fan ZHANG
1
;
Jing CHENG
1
;
Hongxia TAN
1
;
Juan OUYANG
1
;
Junxun LI
1
Author Information
1. 中山大学附属第一医院医学检验科,广州 510080
- Publication Type:Journal Article
- Keywords:
Large language model;
Medical laboratory technology;
Clinical Medical Laboratory Technician Qualification Examination;
CLEAR framework;
Doubao;
Yuanbao
- From:
Chinese Journal of Medical Education Research
2025;24(11):1447-1453
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To assess the professional knowledge proficiency of mainstream large language models (LLMs) in medical laboratory education and to explore their potential as educational aids for medical laboratory technology students.Methods:A comprehensive evaluation was conducted using 400 authentic questions from the 2023 Chinese National Clinical Medical Laboratory Technician Qualification Examination. Five LLMs (Copilot, Grok, Yuanbao, Doubao, and Kimi) were tested through two-round interactions using zero-shot prompting and interaction-optimized prompting strategies. The accuracy of answers and the quality of generated content were evaluated. Performance disparities were analyzed using Cochran's Q test. Content quality was scored through the CLEAR framework (completeness, lack of false information, evidence-based reasoning, appropriateness, relevance).Results:In the first-round test, Doubao achieved the highest overall accuracy rate (375/400). The overall accuracy rates of Doubao and Yuanbao significantly outperformed Copilot and Kimi ( P<0.001). After the second-round interactive optimization, the accuracy rate of Kimi significantly improved ( P<0.05), whereas other LLMs showed slight improvements ( P>0.05). Doubao still had the highest overall accuracy rate (380/400). The overall accuracy rates of Doubao and Yuanbao significantly outperformed Copilot ( P<0.005). Evaluation based on the CLEAR framework revealed that Yuanbao, Doubao, and Kimi significantly outperformed foreign models in the dimensions of evidence-based reasoning ( P<0.003) and completeness ( P<0.05), demonstrating standardized citation of authoritative evidence and superior content quality. Conclusions:The tested LLMs possess extensive medical laboratory knowledge. The accuracy of their answers and the quality of the generated content can be improved through single-question input, specifying evidence requirements, and enabling advanced reasoning functions. Domestic LLMs are comparable to foreign LLMs in terms of accuracy, and have significant advantages in the dimensions of evidence-based reasoning and completeness. LLMs can serve as auxiliary tools for learning professional knowledge in medical laboratory technology.