Application of random forest algorithm and logistic regression in predicting the risk of macrosomia
10.3969/j.issn.1006-2483.2022.03.004
- VernacularTitle:随机森林算法和logistic回归在预测巨大儿发病风险中的应用研究
- Author:
Xuan LIU
1
;
Ruiyi LIU
1
;
Yimin QU
1
;
Yuping WANG
1
;
Yu JIANG
1
Author Information
1. Department of Epidemiology and Biostatistics, School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China
- Publication Type:Journal Article
- Keywords:
Macrosomia;
Maternal and child health;
Machine learning;
Prediction model
- From:
Journal of Public Health and Preventive Medicine
2022;33(3):17-21
- CountryChina
- Language:Chinese
-
Abstract:
Objective To establish macrosomia risk prediction models based on a cohort study, and to analyze and compare the results. Methods The research subjects were the pregnant women of the Chinese Pregnant Women Cohort Study. The general demographic information and clinical data of pregnant women were collected through the questionnaire and physical examination, and the related outcomes of newborns were obtained by follow-up. The dataset was divided into training set and test set by a 3:1 ratio. Multivariate logistic regression analysis (LR) and random forest algorithm (RF) were used to construct macrosomia risk prediction models in the training set, and the models were verified in the test set. The prediction efficiency of the models was evaluated by Kappa and the area under the receiver operating characteristic curve (ROC). Results Among 5544 pregnant women, 397 women delivered macrosomia, and the incidence of macrosomia was 7.16%. Among the pregnant women who delivered macrosomia, 10.08% (40/397) were over 35 years old, 27.46% (109/397) were overweight or obese, and 60.96% (242/397) were excessive gestational weight gain (GWG). LR was used to establish a macrosomia risk prediction model to predict the test set, with the accuracy of 0.716, the sensitivity of 0.719, the specificity of 0.715, the Kappa value of 0.428, the Yoden index of 0.393, and the AUC of 0.796 (95% CI: 0.777-0.815). RF was used to construct a risk prediction model to predict the test set, with the accuracy of 0.819, the sensitivity of 0.782, the specificity of 0.846, the Kappa value of 0.629, the Yoden index of 0.439, and the AUC of 0.897 (95% CI: 0.883-0.910). Conclusion The prediction effect of the two models is satisfactory. The random forest algorithm has a higher predictive effect on the risk of macrosomia in this cohort, but the multivariate logistic regression analysis can directly explain the influencing factors of the macrosomia. It is suggested to integrate the advantages of the two models in the future, so that they can play a more important role in macrosomia risk prediction.