Construction of a Diagnostic Model for Traditional Chinese Medicine Syndromes of Chronic Cough Based on the Voting Ensemble Machine Learning Algorithm

Yichen BAI; Suyang QIN; Chongyun ZHOU; Liqing SHI; Kun JI; Chuchu ZHANG; Panfei LI; Tangming CUI; Haiyan LI

Return

Construction of a Diagnostic Model for Traditional Chinese Medicine Syndromes of Chronic Cough Based on the Voting Ensemble Machine Learning Algorithm

VernacularTitle:基于机器学习Voting集成算法的慢性咳嗽中医证候诊断模型构建
Author: Yichen BAI ¹ ; Suyang QIN ¹ ; Chongyun ZHOU ¹ ; Liqing SHI ² ; Kun JI ² ; Chuchu ZHANG ¹ ; Panfei LI ³ ; Tangming CUI ¹ ; Haiyan LI ¹
Author Information

1. Institute of Information on Traditional Chinese Medicine，China Academy of Chinese Medical Sciences，Beijing，100700
2. Dongfang Hospital，Beijing University of Chinese Medicine
3. School of Medical Technology and Information Engineering，Zhejiang Chinese Medical University
Publication Type:Journal Article
Keywords: chronic cough; machine learning; syndrome; diagnosis model; Voting ensemble learning
From: Journal of Traditional Chinese Medicine 2025;66(11):1119-1127
CountryChina
Language:Chinese
Abstract: ObjectiveTo explore the construction of a machine learning model for the diagnosis of traditional Chinese medicine （TCM） syndromes in chronic cough and the optimization of this model using the Voting ensemble algorithm. MethodsA retrospective analysis was conducted using clinical data from 921 patients with chronic cough treated at the Respiratory Department of Dongfang Hospital， Beijing University of Chinese Medicine. After standardized processing， 84 clinical features were extracted to determine TCM syndrome types. A specialized dataset for TCM syndrome diagnosis in chronic cough was formed by selecting syndrome types with more than 50 cases. The synthetic minority over-sampling technique （SMOTE） was employed to balance the dataset. Four base models， logistic regression （LR）， decision tree （dt）， multilayer perceptron （MLP）， and Bagging， were constructed and integrated using a hard voting strategy to form a Voting ensemble model. Model performance was evaluated using accuracy， recall， precision， F1-score， receiver operating characteristic （ROC） curve， area under the curve （AUC）， and confusion matrix. ResultsAmong the 921 cases， six syndrome types had over 50 cases each， phlegm-heat obstructing the lung （294 cases）， wind pathogen latent in the lung （103 cases）， cold-phlegm obstructing the lung （102 cases）， damp-heat stagnating in the lung （64 cases）， lung yang deficiency （54 cases）， and phlegm-damp obstructing the lung （53 cases）， yielding a total of 670 cases in the specialized dataset. High-frequency symptoms among these patients included cough， expectoration， odor-induced cough， throat itchiness， itch-induced cough， and cough triggered by cold wind. Among the four base models， the MLP model showed the best diagnostic performance （test accuracy： 0.9104； AUC： 0.9828）. Compared with the base models， the Voting ensemble model achieved superior performance with an accuracy of 0.9289 on the training set and 0.9253 on the test set， showing a minimal overfitting gap of 0.0036. It also achieved the highest AUC （0.9836） in the test set， outperforming all base models. The model exhi-bited especially strong diagnostic performance for damp-heat stagnating in the lung （AUC： 0.9984） and wind pathogen latent in the lung （AUC： 0.9970）. ConclusionThe Voting ensemble algorithm effectively integrates the strengths of multiple machine learning models， resulting in an optimized diagnostic model for TCM syndromes in chronic cough with high accuracy and enhanced generalization ability.