Evaluation of the Application Potential and Challenges of Large Language Models in the Field of Laboratory Medicine
10.13602/j.cnki.jcls.2024.08.12
- VernacularTitle:大语言模型在检验医学领域的应用潜力与挑战评估
- Author:
Xiaoqin LU
1
,
2
,
3
;
Wei JIA
;
Yuxiang WU
;
Yongkang WU
Author Information
1. 四川大学华西医院实验医学科,成都 610041
2. 金堂县第一人民医院,成都 610400
3. 雅安职业技术学院药学与检验学院,四川雅安 625000
- Keywords:
large language model;
medical laboratory;
artificial intelligence;
result interpretation;
case analysis
- From:
Chinese Journal of Clinical Laboratory Science
2024;42(8):619-623
- CountryChina
- Language:Chinese
-
Abstract:
Objective To evaluate the performance of ChatGPT-4.0 and ERNIE Bot-4.0 in the field of laboratory medicine,and ex-plore their application potential and challenges in this professional domain.Methods Using the national clinical medical laboratory technology(intermediate)examination questions as a benchmark,we compared the performance of the two models in mastering labora-tory medicine knowledge and answering consistency.We also and assessed the models'ability in interpreting test results and assisting diagnosis through 30 laboratory medicine cases.Results In the clinical medical examination technology test,both models passed the 60%qualification threshold.ChatGPT-4.0 was superior to ERNIE Bot-4.0 in terms of answering speed and consistency,but its answer-ing accuracy was significantly lower than that of ERNIE Bot-4.0(73.25%vs 80.75%).ERNIE Bot-4.0's accuracy rate was higher than the average accuracy rate of clinical aboratory personnel in this examination(78.03%).In the accuracy analysis of different question types,both performed worst in experimental technology questions(ERNIE Bot-4.0:66.32%,ChatGPT-4.0:60.53%)and best in bas-ic medical knowledge questions(both scoring 86.00%).In the case analysis test,ERNIE Bot-4.0 outperformed ChatGPT-4.0 in all cat-egories.Both models performed well in routine case analysis but made errors in complex case analysis.Conclusion In the field of la-boratory medicine,both large language models have shown certain application potential,especially in a Chinese context,where ERNIE Bot-4.0 significantly outperforms ChatGPT-4.0 in terms of answering accuracy and case analysis ability,indicating its relative advantage in domestic applications.However,both models still need improvement in experimental technical knowledge,complex case analysis ca-pabilities,and the accuracy and consistency of result output.At the current stage,there are still certain risks in directly applying such general large language models to clinical test result interpretation and assisted diagnosis,which provides a new research direction for the interpretation of test reports.