Application of SMOTE_ENN Combined with AdaBoost in Clinical Prediction Model
10.11783/j.issn.1002-3674.2023.06.004
- VernacularTitle:SMOTE_ENN结合AdaBoost在临床预测模型中的应用探析
- Author:
Shuqi LI
1
;
Biao GUANG
;
Yufeng ZHAO
Author Information
1. 湖北中医药大学信息工程学院(430065)
- Keywords:
SMOTE;
ENN;
AdaBoost;
Clinical prediction model;
Unbalanced Data
- From:
Chinese Journal of Health Statistics
2023;40(6):817-821
- CountryChina
- Language:Chinese
-
Abstract:
Objective To explore the prediction effect of SMOTE_ENN mixed sampling combined with AdaBoost algorithm in unbalanced clinical data classification model.Methods Grid search was used and different sampling ratios were set.Combined with real data,four mixed sampling methods of ROS_RUS,SMOTE_RUS,SMOTE_Tomek and SMOTE_ENN were applied to build models based on DT,SVM and AdaBoost classification algorithms,respectively,and their performances were compared.Selecting Recall,F1 value,AUC three evaluation indicators,50%discount cross-validation repeated three times to take the average.Another two UCI data sets are selected to validate the model externally.Results Among the 12 classification models,the performance of SMOTE_ENN mixed sampling combined with AdaBoost was the best,the values of Recall,F1 and AUC were 0.747,0.751 and 0.776 respectively,and the best sampling rate was 50%SMOTE oversampling combined with 70%ENN undersampling.Conclusion SMOTE_ENN mixed sampling combined with AdaBoost model can effectively improve the clinical outcome prediction efficiency of unbalanced data of HT patients,and the best proportional sampling can effectively solve the problem that there is no clear sampling rate in previous resampling.After further verification of the open UCI data set,the model can be popularized and applied.