Early Recurrence Prediction Model for DLBCL based on Gaussian Mixture Model Bi-directional Clustering Resampling and Random Forest
10.11783/j.issn.1002-3674.2025.01.002
- VernacularTitle:基于高斯混合模型双向聚类重采样和随机森林构建DLBCL早期复发预测模型
- Author:
Junxia WANG
1
;
Yanbo ZHANG
;
Hongmei YU
Author Information
1. 山西医科大学公共卫生学院卫生统计教研室(030001);山西医科大学公共卫生学院重大疾病风险评估山西省重点实验室(030001)
- Publication Type:Journal Article
- Keywords:
Class imbalance;
Gaussian mixture model clustering oversampling;
Random forest;
Recurrence prediction;
Diffuse large B-cell lymphoma
- From:
Chinese Journal of Health Statistics
2025;42(1):7-11,17
- CountryChina
- Language:Chinese
-
Abstract:
Objective We apply a class imbalance treatment method that can solve the between-class imbalance problem and the within-class imbalance problem of the minority class and the majority class at the same time.And combining it with RF classifier to achieve early recurrence prediction in DLBLC patients,which provided a reference for the treatment of DLBLC patients.Methods Firstly,we apply a class imbalance processing method based on Gaussian mixture model bi-directional clustering resampling to process the data.And compared with ROS,SMOTE,Borderline-1 SMOTE,Borderline-2 SMOTE,GMM oversampling,GMM undersampling,SMOTE+RUS,SMOTE+GMM and GMM+RUS.Afterwards,in order to verify the performance of RF,we use logistic regression and decision tree models as controls.Finally,the evaluation of the model is carried out in terms of discrimination and calibration.Results The RF model with GMM-GMM resampling achieved relatively optimal classification performance(accuracy=0.79,AUC=0.87,sensitivity=0.71,specificity=0.87,G-means=0.79,MSE=0.21).Conclusion GMM-GMM is superior to other traditional resampling methods,and combining it with the RF model for the prediction of early recurrence in DLBCL patients has achieved relatively good classification results,which can well realize the prediction of early recurrence in DLBCL patients.