Named Entity Recognition of Traditional Chinese Medicine Ancient Records Based on Multi-feature Fusion
10.3969/j.issn.1673-6036.2024.11.008
- VernacularTitle:多特征融合的中医古籍医案命名实体识别研究
- Author:
Luyao ZHANG
1
;
Jianhua SHU
;
Peng WANG
;
Hongxing KAN
;
Yongxiang XU
;
Jie ZHOU
;
Shuxuan TANG
Author Information
1. 安徽中医药大学医药信息工程学院 合肥 230012
- Keywords:
traditional Chinese medicine(TCM)ancient records;
named entity recognition(NER);
corpus;
dictionary;
natural language processing(NLP)
- From:
Journal of Medical Informatics
2024;45(11):50-58
- CountryChina
- Language:Chinese
-
Abstract:
Purpose/Significance To construct a named entity corpus of traditional Chinese medicine(TCM)ancient records,and to improve the recognition accuracy and applicability of the general domain named entity recognition(NER)model in the field of TCM ancient records.Method/Process Annotation standards for entities in TCM ancient records are formulated,and 2 384 Xin'an medical records are annotated.A RoBERTa-BiLSTM-CRF model is developed,and word vectors with semantic features are generated using the RoBERTa pre-trained language model.The BiLSTM-CRF model is used to learn the global semantic features of sequences and decode and output the optimal label sequence.Dictionary and rule features are incorporated to enhance the model's capability to recognize entity boundaries and categories.Result/Conclusion The model shows a good recognition effect on the named entity corpus of Xin'an medical cases.Integration of domain terminology dictionaries and rule-based features improves the overall Fl score to 72.8%.