Application and Interpretability of the Unbalanced Ensemble Algorithm LASSO-EasyEnsemble in Prognostic Prediction of Coronary Heart Disease
10.11783/j.issn.1002-3674.2025.02.008
- VernacularTitle:不平衡集成算法LASSO-EasyEnsemble在冠心病预后预测中的应用及可解释性研究
- Author:
Jiaxin ZAN
1
;
Hong YANG
;
Jing TIAN
Author Information
1. 山西医科大学公共卫生学院流行病与卫生统计学教研室(030001)
- Publication Type:Journal Article
- Keywords:
Coronary heart disease;
Imbalanced data;
Ensemble learning;
Prognosis prediction;
Interpretability
- From:
Chinese Journal of Health Statistics
2025;42(2):197-203
- CountryChina
- Language:Chinese
-
Abstract:
Objective In light of the high noise and inter-class imbalance encountered in the prognosis prediction of coronary heart disease,this study aims to construct an EasyEnsemble imbalanced ensemble model after LASSO feature selection and evaluate its performance.Methods Based on survey data from the National Health and Nutrition Examination Survey public database for the years 2009-2018,with follow-up data until 2019,this study aimed to predict the prognosis of coronary heart disease based on whether there was death due to the disease as the outcome.LASSO feature selection was employed to select relevant features.Subsequently,an EasyEnsemble imbalanced ensemble prediction model,as well as SMOTE+LightGBM,XGBoost,and Random Forest prediction models,were constructed using the selected features.Grid search was performed to optimize the parameters of each model.The classification performance of the models was evaluated using metrics such as AUC,precision,specificity,G-mean,and performance curves.Additionally,SHAP analysis was applied to interpret the models'results and provide insights into their interpretability.Results The EasyEnsemble model exhibited the highest overall performance,with an AUC of 0.80(95%CI:0.79~0.82),precision of 0.86(95%CI:0.78~0.93),specificity of 0.99(95%CI:0.98~0.99),and G-mean of 0.79(95%CI:0.76~0.83),as evidenced by the performance curves.Additionally,age,serum phosphorus,diabetes,and albumin were identified as important factors influencing patient prognosis.Conclusion The LASSO- EasyEnsemble imbalanced ensemble model enables accurate prognosis prediction for coronary heart disease patients,combining SHAP can help clinicians better assess disease severity and identify at-risk groups for personalized patient management.