Contribution of the large-scale population cohort in disease risk prediction model study: taking United Kingdom Biobank as an example

Chenxu ZHU; Yuxin SONG; Yuantao HAO; Feng CHEN; Yongyue WEI

Return

Contribution of the large-scale population cohort in disease risk prediction model study: taking United Kingdom Biobank as an example

VernacularTitle:大型人群队列在疾病风险预测模型研究中的作用：以英国生物银行为例
Author: Chenxu ZHU ¹ ; Yuxin SONG ; Yuantao HAO ; Feng CHEN ; Yongyue WEI
Author Information

1. 南京医科大学公共卫生学院生物统计学系，南京　211166
Keywords: Large-scale population cohort; Disease risk prediction model; United Kingdom Biobank; Data sharing
From: Chinese Journal of Epidemiology 2024;45(10):1433-1440
CountryChina
Language:Chinese
Abstract: The disease risk prediction model is the basis of precision prevention and an essential reference for clinical treatment decisions. The development of risk prediction models requires the support of a large amount of high-quality data. A large population cohort study is an important basis for this study. The United Kingdom Biobank (UKB), as a mega-population cohort and biobank, has played an essential role in the exploration of disease etiology and research related to disease prevention and control, with its rich baseline and follow-up data and concepts and mechanisms shared globally. This study followed PRISMA guidelines and included 210 articles with corresponding authors from 18 countries, of which 58 (27.62%) were from the UKB. A total of 491 disease risk prediction models were extracted for cancer, cardiovascular and cerebrovascular diseases, endocrine and metabolic diseases, respiratory diseases, and other diseases and their subgroups, of which 132 were developed by UKB without validation, 183 were developed by UKB with internal validation, 17 were developed by UKB with external validation, and 159 were developed by external development with UKB validation. A total of 188 models used only macro variables (38.29%), and 303 models combined macro and micro variables (61.71%). Model construction methods included survival outcome models, logistic regression, and machine learning. Survival outcome models were dominated by Cox proportional risk regression models and a few models considering competitive risk, accelerated failure models, or different baseline risk functions. Machine learning models included random forest, XGBoost, CatBoost, support vector machine, convolutional neural network, and other methods. The UKB is an essential resource for multiple disease risk prediction modeling studies.