Construction of Traditional Chinese Medicine Question-Answering Large Language Model Based on Retrieval-Augmented Generation Technology
10.14148/j.issn.1672-0482.2024.1375
- VernacularTitle:基于检索增强生成技术的中医药问答大语言模型的构建
- Author:
Yuming ZHANG
1
;
Hongyan LI
;
Xufeng LANG
;
Zuojian ZHOU
;
Yun LING
;
Ziyan WANG
Author Information
1. 南京中医药大学人工智能与信息技术学院,江苏 南京 210023;江苏省智慧中医药健康服务工程研究中心,江苏 南京 210023
- Publication Type:Journal Article
- Keywords:
TCM knowledge base;
large language model;
question-answering system;
retrieval-augmented generation technology
- From:
Journal of Nanjing University of Traditional Chinese Medicine
2024;40(12):1375-1382
- CountryChina
- Language:Chinese
-
Abstract:
OBJECTIVE To construct a large language model for TCM question-answering.METHODS TCM corpora were built by collecting TCM classics such as Treatise on Cold Damage,TCM textbooks,prescriptions from famous TCM doctors,and other manually annotated TCM datasets.A TCM knowledge vector library was constructed.The RAG technology was fused with the P-Tuning v2 fine-tuning method and the large language model(ChatGLM2-6B)to build the TCM question-answering large language model.RESULTS Recision,Recall,and F1 score were used as evaluation metrics for knowledge question-answering tasks.The model achieved over 90%accuracy in simple TCM question-answering,with the highest accuracy in component-type questions,reac-hing an F1 score of 0.928.The accuracy of medium to high difficulty questions ranged from 75.8%to 87.7%,with F1 scores all ex-ceeding 0.766.Expert ratings based on diversity and accuracy were used as evaluation metrics for TCM question generation tasks,and the model in this paper scored 9.5 points higher than the baseline model.CONCLUSION The model in this paper demonstrates good semantic understanding and high reliability,effectively alleviating model hallucinations and helping patients clarify their question intentions.It is of great significance for advancing research on TCM knowledge and providing personalized interactive answers.It also provides an innovative approach to promoting the inheritance and popularization of TCM experience and the intelligent construction of TCM diagnosis and treatment.