Comparison of logistic regression and machine learning algorithm in establishment of pre-eclampsia prediction model
10.3760/cma.j.cn113903-20230928-00241
- VernacularTitle:Logistic回归法和机器学习算法构建子痫前期预测模型的比较
- Author:
Xingneng XU
1
;
Shengzhu CHEN
;
Jiayi ZHOU
;
Si YANG
;
Xuwei WANG
;
Bolan YU
Author Information
1. 广州医科大学附属第三医院妇产科,广州 510150
- Keywords:
Preeclampsia;
Prediction model;
Logistic regression model;
Random forest algorithm;
eXtreme Gradient Boosting algorithm
- From:
Chinese Journal of Perinatal Medicine
2024;27(7):572-581
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To construct preeclampsia (PE) prediction models using information from the hospital electronic medical information and clinical laboratory data through logistic regression (LR) and machine learning algorithms, and to compare their predictive performance.Methods:The study was conducted based on the information from Rouji Pregnancy Test Database and the perinatal data of women who visited the Third Affiliated Hospital of Guangzhou Medical University from January 1, 2012, to December 31, 2019. Drawing upon clinical treatment guidelines and related literature, 28 clinical indicators from 2 736 pregnant women at 24 to 28 weeks of gestation were selected after a thorough integration and used for the construction of the PE prediction model dataset. Patients diagnosed with PE comprised the PE group ( n=245), while another 255 cases from the rest who did not have PE were selected, with undersampling method, as the control group. The Random Forest algorithm (RF), eXtreme Gradient Boosting (XGB) algorithm, and LR model were each employed to develop predictive models for PE. Following the construction of the models, external validation of PE prediction accuracy was carried out using data acquired from an independent prospective cohort study on PE that was conducted from June 2019 to December 2022, in which 38 PE cases and 80 controls were chosen. The performance of predictive models were evaluated using metrics such as accuracy, sensitivity, specificity, and the area under the curve (AUC) of receiver operating characteristic. Results:Indicators included in the construction of the three predictive models suggested that uric acid, creatinine, maternal age, early pregnancy body mass index, urea, triglycerides, red blood cell count, eosinophil count, total cholesterol, neutrophil count, urine protein, alanine aminotransferase, and urine occult blood were influential in PE prediction models. The AUCs for RF, XGB, and LR models in the training and test sets were 0.851 (95% CI:0.730-0.891), 0.955 (95% CI:0.865-0.987), 0.884 (95% CI:0.767-0.923) vs. 0.845 (95% CI:0.723-0.868), 0.907 (95% CI:0.791-0.919), 0.851 (95% CI:0.755-0.893), respectively. In the test set, the accuracy, sensitivity, and specificity for RF, XGB, and LR models were 0.803, 0.607, 0.958, 0.864, 0.790, 0.927, and 0.832, 0.661, 0.971, respectively. In the external validation of the RF, XGB and LR predictive models, the accuracy were 0.822, 0.814, and 0.763; the sensitivity were 0.737, 0.789, and 0.605, and the specificity were 0.863, 0.825, and 0.838, respectively. Among them, XGB model showed the highest Youden's index (0.614). Conclusion:Compared to traditional methods of model construction, machine learning algorithms can establish more effective PE prediction models using real clinical data.