Development and evaluation of a machine learning prediction model for large for gestational age
10.3760/cma.j.cn112338-20210824-00677
- VernacularTitle:基于机器学习算法的大于胎龄儿风险预测模型
- Author:
Xi BAI
1
;
Yunyun LUO
;
Zhibo ZHOU
;
Mingliang SU
;
Liuqing YANG
;
Shi CHEN
;
Hongbo YANG
;
Huijuan ZHU
;
Hui PAN
Author Information
1. 中国医学科学院/北京协和医学院/北京协和医院内分泌科/国家卫生健康委员会内分泌重点实验室/疑难重症及罕见病国家重点实验室,北京 100730
- Keywords:
Machine learning;
Large for gestational age;
Risk prediction model
- From:
Chinese Journal of Epidemiology
2021;42(12):2143-2148
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To develop and validate a useful predictive model for large gestational age (LGA) in pregnancy using a machine learning (ML) algorithm and compare its performance with the traditional logistic regression model.Methods:Data were obtained from the National Free Preconception Health Examination Project in China, carried out in 220 counties of 31 provinces from 2010 to 2012, covering all rural couples with a planned pregnancy. This study included all teams of childbearing age who delivered newborns within 24-42 weeks of gestational age and their newborns. Ten different ML algorithms were used to establish LGA prediction models, and the prediction performance of these models was evaluated.Results:A total of 104 936 newborns were included, including 54 856 boys (52.3%) and 50 080 girls (47.7%). The incidence of LGA was 11.7% (12 279). The imbalance between the two groups was addressed by the under- sampling technique, after which the overall performance of the ML models was significantly improved. The CatBoost model achieved the highest area under the receiver-operating-characteristic curve (AUC) value of 0.932. The logistic regression model had the worst performance, with an AUC of 0.555.Conclusions:In predicting the risk for LGA in pregnancy, the ML algorithms outperform the traditional logistic regression method. Compared to other ML algorithms, CatBoost could improve the performance, and it deserves further investigation.