Machine learning prediction model of diabetic kidney disease in different regions of Gansu province
10.3969/j.issn.1006-6187.2025.01.003
- VernacularTitle:甘肃省不同地区糖尿病肾脏疾病的机器学习预测模型的研究
- Author:
Jianning YANG
1
;
Doudou HONG
;
Yang LI
;
Jing YU
;
Fan YANG
;
Ziying WEN
;
Wenjun QIAO
;
Jing ZHANG
;
Qi ZHANG
Author Information
1. 730000 兰州,甘肃中医药大学第一临床医学院
- Publication Type:Journal Article
- Keywords:
Diabetic kidney disease;
Diabetes mellitus,type 2;
Machine learning;
Prediction model
- From:
Chinese Journal of Diabetes
2025;33(1):8-15
- CountryChina
- Language:Chinese
-
Abstract:
Objective To construct a machine learning prediction model for diabetic kidney disease(DKD)in type 2 diabetes mellitus(T2DM)patients in the plain-sand and loess hilly areas of Gansu Province,and analyze the interpretability of the model.Methods A multi-stage stratified random sampling method was used to collect the data of T2DM patients in the two areas.After key feature screening,eight ML prediction models were constructed for the risk of DKD in the two areas.The receiver operating characteristic(ROC)curve,accuracy and F1 index were used to evaluate the model,and Shapley additive explanation(SHAP)algorithm was used for model interpretation.Results A total of 1599 patients with T2DM were enrolled in this study.After feature screening,ten variables were selected for model construction in the plain-sand areas.Among the eight models,the gradient boosting decision tree(GBDT)model had the highest prediction efficiency.The area under the curve(AUC)of the test dataset was 0.972,the accuracy was 0.949,and the F1 index was 0.884.In the loess hilly region,12 variables were included in the model,and the best model was the random forest(RF).The AUC of the test set was 0.966,the accuracy was 0.951,and the F1 index was 0.861.SHAP analysis showed that in addition to serum creatinine,age,LDL-C,HbA1c,DM duration,serum uric acid and urinary microalbumin were also closely related to the high risk of DKD.Conclusions The GBDT and RF models have good predictive efficiency for the occurrence of DKD in the two areas,which can be used for the screening of DKD high-risk populations and the in-depth exploration of potential risk factors in the two areas.