Preliminary study of machine learning in the screening of proteinuria in rural areas of Shanxi province
10.3760/cma.j.cn441217-20221028-01041
- VernacularTitle:机器学习在山西省农村地区蛋白尿筛查中的初探
- Author:
Yuanyue LU
1
;
Ziliang LI
;
Wangxin LI
;
Yanqin LIU
;
Rongshan LI
;
Xiaoshuang ZHOU
Author Information
1. 山西医科大学第五临床医学院肾内科,太原 030012
- Keywords:
Proteinuria;
Machine learning;
Kidney diseases;
Risk factors;
Shanxi province
- From:
Chinese Journal of Nephrology
2023;39(7):491-498
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To screen the incidence of proteinuria in rural areas of Shanxi province and construct a risk prediction model of proteinuria based on machine learning algorithm.Methods:It was a cross-sectional investigation study. The residents ≥30 years old in rural areas of Shanxi province from April to November 2019 were screened by multi-stage stratified sampling method, and data from questionnaire surveys, physical examinations, and laboratory examinations were collected. Urine albumin/creatinine ratio ≥30 mg/g was defined as proteinuria, and the incidence of proteinuria was calculated. Subjects were divided into proteinuria group and non-proteinuria group. The machine learning binary classification model of proteinuria and non-proteinuria was constructed based on the stackable integrated logistic regression algorithm (SE-LR), logistic regression, support vector machine, decision tree, random forest and extreme gradient lift algorithms, respectively. The area under the receiver operating characteristic curve, accuracy, recall, and F1 weights were used to evaluate the predictive efficiency of the comparison models. Finally, the importance of the predictive features of the model with the best overall performance was ranked.Results:There were 8 869 rural residents included in the study, aged (58.59±9.49) years old, with 3 872 males (43.66%) and 4 997 females (56.34%). The prevalence of proteinuria in rural areas of Shanxi province was 13.49% (1 196/8 869). Blood pressure, pulse, body mass index, waist circumference, proportion of obesity or overweight, proportion of hypertension, proportion of moderate to severe salt intake, glycosylated hemoglobin, uric pH value, urinary specific gravity, proportion of positive urinary occult blood, proportion of positive urinary glucose, proportion of positive urinary ketone body, proportion of urinary red blood cell count ≥5/μl, proportion of urinary white blood cell count ≥10/μl and urinary α1 microglobulin in the proteinuria group were all higher than those in the non-proteinuria group (all P<0.05). The proportions of lack of exercise and drinking history in the proteinuria group were lower than those in non-proteinuria group (both P<0.05). The overall performance of SE-LR model was the best, with the area under the curve (0.736, 95% CI 0.719-0.746) slightly lower than that of the logistic regression model (0.745, 95% CI 0.680-0.762), and the highest accuracy (0.844), recall rate (0.621) and F1 weighting value (0.801). In the SE-LR model, the orders of importance of the top 10 features were urinary α1- microglobulin, urinary occult blood, urinary sugar, uric acid basicity, smoking history,overweight or obesity, body mass index, total cholesterol, glycosylated hemoglobin and hypertension. Conclusions:The prevalence of proteinuria is high in rural areas of Shanxi province. The risk prediction model of proteinuria established by machine learning algorithm can predict the risk of proteinuria and identify its risk factors, which can provide a scientific basis for disease prevention, intervention, and treatment in the community and clinic to a certain extent.