Joint Relation Extraction of Famous Medical Cases with CasRel Model Combining Entity Mapping and Data Augmentation
10.13422/j.cnki.syfjx.20251866
- VernacularTitle:结合数据增强与实体映射CasRel模型的名家医案联合关系抽取
- Author:
Yuxin LI
1
;
Xinghua XIANG
1
;
Hang YANG
1
;
Dasheng LIU
1
;
Jiaheng WANG
1
;
Zhiwei ZHAO
1
;
Jiaxu HAN
1
;
Mengjie WU
1
;
Qianzi CHE
1
;
Wei YANG
1
Author Information
1. Institute of Basic Research in Clinical Medicine,China Academy of Chinese Medical Sciences, Beijing 100700,China
- Publication Type:Journal Article
- Keywords:
data augmentation;
famous medical cases;
relationship extraction;
joint learning approach;
cascade binary tagging framework for relation triple extraction(CasRel) model;
knowledge graph
- From:
Chinese Journal of Experimental Traditional Medical Formulae
2026;32(2):218-225
- CountryChina
- Language:Chinese
-
Abstract:
ObjectiveTo address the challenges of unstructured classical Chinese expressions, nested entity relationships, and limited annotated data in famous traditional Chinese medicine(TCM) case records, this study proposes a joint relation extraction framework that integrates data augmentation and entity mapping, aiming to support the construction of TCM diagnostic knowledge graphs and clinical pattern mining. MethodsWe developed an annotation structure for entities and their relationships in TCM case texts and applied a data augmentation strategy by incorporating multiple ancient texts to expand the relation extraction dataset. A cascade binary tagging framework for relation triple extraction(CasRel) model for TCM semantics was designed, integrating a pre-trained bidirectional encoder representations from transformers(BERT) layer for classical TCM texts to enhance semantic representation, and using a head entity-relation-tail entity mapping mechanism to address entity nesting and relation overlapping issues. ResultsExperimental results showed that the CasRel model, combining data augmentation and entity mapping, outperformed the pipeline-based Bert-Radical-Lexicon(BRL)-bidirectional long short-term memory(BiLSTM)-Attention model. The overall precision, recall, and F1-score across 12 relation types reached 65.73%, 64.03%, and 64.87%, which represent improvements of 14.26%, 7.98%, and 11.21% compared to the BRL-BiLSTM-Attention model, respectively. Notably, the F1-score for tongue syndrome relations increased by 22.68%(69.32%), and the prescription-syndrome relations performed the best with the F1-score of 70.10%. ConclusionThe proposed framework significantly improves the semantic representation and complex dependencies in TCM texts, offering a reusable technical framework for structured mining of TCM case records. The constructed knowledge graph can support clinical syndrome differentiation, prescription optimization, and drug compatibility, providing a methodological reference for TCM artificial intelligence research.