Construction of a risk prediction model for blood pressure abnormality in occupational populations based on longitudinal occupational health surveillance data
- VernacularTitle:基于连续职业健康检查数据的职业人群血压异常风险预测模型构建
- Author:
Tengxiao SHAN
1
;
Jiming ZHANG
1
;
Tianyang SHEN
1
;
Zhijun ZHOU
1
Author Information
- Publication Type:Investigation
- Keywords: occupational health examination; occupational population; blood pressure; machine learning; risk prediction model
- From: Journal of Environmental and Occupational Medicine 2026;43(4):435-442
- CountryChina
- Language:Chinese
-
Abstract:
Background The prevalence of chronic diseases among the Chinese occupational population is rising steadily, with hypertension and diabetes becoming important health concerns. Occupational health examinations (OHE) provide stable population coverage, standardized protocols, and fixed follow-up intervals, offering a robust data foundation for risk assessment. However, most existing hypertension prediction studies rely on cross-sectional data and mainly focus on clinic onset, failing to capture the dynamic progression and cumulation of individual risk. Objective To construct a machine learning-based risk prediction model for blood pressure abnormality in occupational populations, providing a reference for health risk stratification and targeted health interventions. Methods Longitudinal data from 2020 to 2023 were extracted from the occupational health examination database of an institution in Shanghai. After excluding individuals with hypertension in any of the first three years,
3710 workers who participated 4 consecutive years of OHE were included. Six categories of blood pressure-related indicators from 2020 to 2022 were selected, including basic information, lifestyles, occupational exposures, cardiovascular indicators, routine blood tests, and biochemical markers, comprising 30 variables in total. For feature engineering, continuous variables were processed to reflect three-year dynamic characteristics: level (mean), variability (variance), and temporal trend (slope). Categorical variables were incorporated using 2022 values. Least absolute shrinkage and selection operator regression was applied to identify variables for inclusion in the machine learning models. Five models (decision tree, logistic regression, random forest, support vector machine, and XGBoost) were employed to predict blood pressure abnormality risk in 2023. And then sensitive analysis was conducted. The optimal model was selected based on the area under the curve (AUC) of receiver operating characteristic. Feature importance and SHAP analyses were applied to interpret the final model. Results Feature engineering transformed 17 continuous variables into 51 secondary variables, combined with 13 baseline variables, resulted in 24 variables screened out by LASSO regression. The logistic regression model achieved the best performance. Feature importance analysis indicated that the dynamic trajectory of blood pressure played a central role in the model, complemented by other biochemical and lifestyle indicators. SHAP analysis further showed that the model’s ability to not only identify risks of high-normal blood pressure and hypertension but also quantify the specific contribution of individual examination indicators to predicted risk, supporting risk-stratified population management in occupational health management. Conclusion Based on an existing occupational health examination database and routinely collected data, this study has developed a risk prediction model for blood pressure abnormalities in occupational populations. By incorporating dynamic change characteristics from longitudinal examinations, the study demonstrates the feasibility of using routine health data for early risk identification and stratified management. However, its applicability is limited by the availability of repeated examination data and the focus on specific occupational groups, requiring further validation across broader populations and diverse clinic scenarios.
