Study on the Construction of a Question-Answer Corpus Dataset for Chinese Medical Knowledge Large Language Models

VernacularTitle:中文医学知识大模型问答语料数据集构建研究
Author: Tingyu LYU ¹ ; Xiaoying LI ; Ying ZHANG ; Yuyang LIU ; Jinhua DU ; Xinyi LI ; Yan LUO ; Xiaoli TANG ; Huiling REN ; Hui LIU ; Hao YIN
Author Information

1. 中国医学科学院/北京协和医学院医学信息研究所/图书馆北京 100005
Keywords: large language models; corpus dataset; model evaluation; medicine
From: Journal of Medical Informatics 2024;45(5):20-25
CountryChina
Language:Chinese
Abstract: Purpose/Significance To construct a Chinese medical knowledge Q&A corpus dataset as a standardized evaluation bench-mark for large language models(LLMs)in the medical domain,so as to improve the accuracy and efficiency of LLMs in handling Chinese medical questions.Method/Process Chinese medical paper knowledge,medical terminology explanations and supplementary questions are acquired from the Chinese medical licensing examination,and open-source Chinese medical Q&A datasets are encompassed in the developed Q&A datasets.Result/Conclusion The Chinese medical knowledge Q&A corpus datasets enrich the sources of existing datasets and promote the objective and comprehensive quantitative evaluation of large models in the medical field.In the near future,additional data such as electronic medical records and those from online health communities will be used to strengthen the support of artificial intelli-gence for the Healthy China strategy.