Effectiveness of automated machine learning models in predicting the risk of preeclampsia in the first trimester
10.3760/cma.j.cn115624-20220512-00355
- VernacularTitle:自动机器学习模型预测孕早期子痫前期风险的效果
- Author:
Hongbo CHEN
1
;
Hong LI
;
Chunmei ZHAO
;
Shaoyun XIE
;
Chunmei JIA
Author Information
1. 济南市第二妇幼保健院孕妇学校,济南 271100
- Keywords:
Preeclampsia;
Pregnancy trimester, first;
Machine learning;
Prediction model;
Screening
- From:
Chinese Journal of Health Management
2022;16(8):553-560
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To explore the application value of automated machine learning (autoML) model in predicting the risk of preeclampsia in the first trimester.Methods:From January 2017 to October 2020, 2 180 singleton pregnant women who were registered in Jinan Second Maternal and Child Health Hospital and underwent pregnancy examination at 12 weeks of gestation were selected. The pregnant women were divided into preeclampsia group (103 cases) and control group (2 077 cases) according to the occurrence of preeclampsia. The differences in clinical data and hematological indexes in the two groups were compared, and the correlation between each index and the risk of preeclampsia was analyzed too. All the pregnant women were randomly divided into training set and test set according to the ratio of 7∶3, and the autogluon autoML algorithm was used to build a variety of machine learning models, and training and cross-validation were performed in the training set to compare the accuracy of the different models. The importance of each index in the autoML model was analyzed, and the autoML model and the logistic regression model were used to predict the risk of preeclampsia in pregnant women in the test set respectively, and the receiver operating characteristic (ROC) curve was used to evaluate the prediction performance of the autoML and the logistic regression model.Results:The age, pre-pregnancy body mass index, body mass index at 12 weeks of gestation, waist circumference at 12 weeks of gestation, proportion of drinking history, high-sensitivity C-reactive protein (hs-CRP), triglyceride, low-density lipoprotein cholesterol (LDL-C), aspartate aminotransferase (AST), platelet distribution width (PDW), mean platelet volume, thyroid stimulating hormone (TSH) and β-human chorionic gonadotropin were all significantly higher than those in the control group (all P<0.05), and the free tri-iodothyronine (free T3), free thyroxine (free T4), placental growth factor (PIGF), soluble fms-like tyrosine kinase-1 (sFlt-1) and pregnancy-associated plasma protein-A (PAPP-A) were all significantly lower than those in the control group (all P<0.05). Correlation analysis showed that the correlation between pre-pregnancy body mass index, body mass index at 12 weeks gestation, waist circumference at 12 weeks gestation, hs-CRP, triacylglycerol, AST, TSH, free T3, free T4, β-HCG, PIGF, sFlt-1, PAPP-A and preeclampsia risk were obviously higher; but the correlation between each index were lower. A total of 18 models in 8 categories were constructed with the autoML model algorithm, and the neural network _L2 based on FastAI had the highest accuracy in the training set (0.963) and the validation set (0.971). The TSH, LDL-C, PDW, waist circumference at 12 weeks of gestation, sFlt-1, AST were more important in the model, while the free T4, total cholesterol, pregnancy times, drinking history, parity and family history of hypertension were less important indicators. The area under the ROC curve of the autoML model for predicting the risk of preeclampsia in the first trimester was significantly higher than that of the logistic regression model (0.984 vs 0.765, P=0.002), while there was no statistical difference in the prediction accuracy of the two prediction models in the training set ( P>0.05). The prediction accuracy and sensitivity of the autoML model in the test set were both significantly higher than those of the logistic regression model (99.54% vs 98.32%, 93.75% vs 75.00%, both P<0.05). Conclusions:Factors such as TSH, LDL-C, PDW, waist circumference, sFlt-1 and AST in the first trimester of pregnancy have a certain correlation with the risk of preeclampsia. The autoML model based on the indicators of the first trimester has a high predictive value for the risk of preeclampsia.