Study on TCM Influenza Syndrome Differentiation Model Based on Machine Learning
10.19879/j.cnki.1005-5304.202403074
- VernacularTitle:基于机器学习的流行性感冒中医辨证模型研究
- Author:
Yuteng ZHANG
1
,
2
;
Hongchun ZHANG
;
Menglin CHEN
;
Xin JIN
;
Jian LIU
Author Information
1. 北京中医药大学,北京 100029
2. 中日友好医院呼吸中心,北京 100029
- Keywords:
influenza;
machine learning;
logistic regression model;
syndrome differentiation model;
feature engineering
- From:
Chinese Journal of Information on Traditional Chinese Medicine
2024;31(9):48-57
- CountryChina
- Language:Chinese
-
Abstract:
Objective To train influenza clinical syndrome data using machine learning methods;To obtain an influenza syndrome differentiation model.Methods The medical records of influenza patients who visited the fever clinic of China-Japan Friendship Hospital from December 2019 to March 2022 were collected.The data set system was used for data processing,and the data generated by different data processing processes were stored separately for training.The study selected logistic regression,decision tree,naive Bayes,support vector machine,multi-layer perceptron,lightGBM and random forest as alternative models,and optimized the hyperparameters through Optuna.Models were trained separately in each data set,and the model prediction performance was evaluated,with the macro-F1 score as the core.Results Totally 1 011 training samples were collected,including 453 cases of wind-heat syndrome,152 cases of superficial wind-cold syndrome,and 406 cases of superficial cold and internal heat syndrome;8 data sets were obtained for training,containing 80 copies of data.After training,the macro-F1 scores of logistic regression,decision tree,naive Bayes,support vector machine,multi-layer perceptron lightGBM and random forest model were 0.783 0,0.774 2,0.731 5,0.782 4,0.716 7,0.793 8 and 0.815 3,respectively.Weighted samples could significantly improve the average model performance,while PCA would reduce the average model performance.The prediction performance of the logistic regression model was the best in the single method models,and the random forest model was the best in the integrated method models.Conclusion In the case of a small sample size,it is more appropriate to use logistic regression,decision tree,support vector machine and lightGBM for the TCM influenza syndrome differentiation model.As the sample size increases,logistic regression,support vector machine,lightGBM and random forest may be more suitable.Different data processing methods will affect model performance.Collecting information on the typical degree of syndrome types is beneficial to improving model performance.