1.Entity Recognition in Famous Medical Records Based on BRL Neural Network Model
Hang YANG ; Yehui PENG ; Wei YANG ; Jiaheng WANG ; Zhiwei ZHAO ; Wenyuan XU ; Yuxin LI ; Yan ZHU ; Lihong LIU
Chinese Journal of Experimental Traditional Medical Formulae 2024;30(24):167-173
ObjectiveIn order to improve the recognition accuracy of named entities in medical record texts and realize the effective mining and utilization of medical record knowledge, a Bert-Radical-Lexicon(BRL) neural network model is constructed to recognize medical record entities with respect to the characteristics of medical record texts. MethodWe selected 408 medical records related to hypertension from the the Complete Library of Famous Medical Records of Chinese Dynasties and constructed a dataset consisting of 1 672 medical records by manually labeling. Then, we randomly divided the dataset into three subsets, including the training set(1 004 cases), the testing set (334 cases) and the validation set(334 cases). Based on this dataset, we built a BRL model that fused various text features of medical records, as well as its variants BRL-B, BRL-L and BRL-R, and a baseline model Base for experiments. During the model training phase, we trained the above models using the training set to reduce the risk of overfitting. We continuously monitored the performance of each model on the validation set during training and saved the model with the best performance. Finally, we evaluated the performance of these models on the testing set. ResultCompared with other models, the BRL model had the best performance in the medical records named entity recognition task, with an overall recognition precision of 90.09%, a recall of 90.61%, and the harmonic mean of the precision and recall(F1) of 90.35% for eight types of entities, including disease, symptom, tongue manifestation, pulse condition, syndrome, method of treatment, prescription and traditional Chinese medicine(TCM). Compared with the Base model, the BRL model improved the overall F1 value of entity recognition by 5.22%, and the F1 value of pulse condition entity increased by 6.92%, which was the largest increase. ConclusionBy incorporating a variety of medical record text features in the embedding layer, the BRL neural network model has stronger named entity recognition ability, and thus extracts more accurate and reliable TCM clinical information.
2.Class-imbalance Prediction and High-dimensional Risk Factor Identification of Adverse Reactions of Traditional Chinese Medicine with Centralized Monitoring in Real-world Hospitals
Feibiao XIE ; Yehui PENG ; Wei YANG ; Jinfa TANG ; Juan LIU ; Weixia LI ; Hui ZHANG ; Dongyuan WU ; Yali WU ; Yuanming LENG ; Xinghua XIANG
Chinese Journal of Experimental Traditional Medical Formulae 2023;29(14):114-122
ObjectiveTo achieve high-dimensional prediction of class imbalanced of adverse drug reaction(ADR) of traditional Chinese medicine(TCM) and to classify and identify risk factors affecting the occurrence of ADR based on the post-marketing safety data of TCM monitored centrally in real world hospitals. MethodThe ensemble clustering resampling combined with regularized Group Lasso regression was used to perform high-dimensional balancing of ADR class-imbalanced data, and then to integrate the balanced datasets to achieve ADR prediction and the risk factor identification by category. ResultA practical example study of the proposed method on a monitoring data of TCM injection performed that the accuracy of the ADR prediction, the prediction sensitivity, the prediction specificity and the area under receiver operating characteristic curve(AUC) were all above 0.8 on the test set. Meanwhile, 40 risk factors affecting the occurrence of ADR were screened out from total 600 high-dimensional variables. And the effect of risk factors on the occurrence of ADR was identified by classification weighting. The important risk factors were classified as follows:past history, medication information, name of combined drugs, disease status, number of combined drugs and personal data. ConclusionIn the real world data of rare ADR with a large amount of clinical variables, this paper realized accurate ADR prediction on high-dimensional and class imbalanced condition, and classified and identified the key risk factors and their clinical significance of categories, so as to provide risk early warning for clinical rational drug use and combined drug use, as well as scientific basis for reevaluation of safety of post-marketing TCM.