Boosting prediction of occupational stress among manufacturing employees by reconstructing cumulative fatigue features with Bayesian sparse autoencoder
- VernacularTitle:基于贝叶斯稀疏自编码器的蓄积性疲劳特征重构对机器学习模型预测制造业员工职业紧张的提升作用
- Author:
Tao SONG
1
;
Yuting ZHOU
2
;
Xinyi LU
2
;
Xinkai WEI
2
;
Qingxin MENG
1
;
Jianlin LOU
2
;
Hongchang ZHOU
2
;
Jin WANG
3
;
Shuang LI
3
Author Information
- Publication Type:Investigation
- Keywords: occupational stress; manufacturing; cumulative fatigue; machine learning; Bayesian sparse autoencoder
- From: Journal of Environmental and Occupational Medicine 2025;42(12):1446-1455
- CountryChina
- Language:Chinese
-
Abstract:
Background Occupational stress has emerged as a critical public health concern affecting the physical and mental well-being of workers in the manufacturing sector. However, researchers typically evaluate its core driver—cumulative fatigue—using a crude binary “present/absent” variable, thereby overlooking the high-dimensional complexity and heterogeneity inherent in fatigue characteristics. This oversimplification constrains both the precision and predictive performance of occupational stress risk assessment model. Objective Leveraging a data-driven approach, to survey data on cumulative fatigue among manufacturing employees, and then use this new classification to develop and validate an occupational stress prediction model, with an ultimate aim of enhancing the accuracy and effectiveness of occupational stress assessment. Methods A set of cross-sectional survey data on
3871 manufacturing employees in 2021 were derived from the “Long working hours exposure and its adverse health effect risk assessment” program of the National Institute for Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention. Occupational stress was assessed using the Core Occupational Stress Measurement Scale, while cumulative fatigue was evaluated via the Worker’s Self-Diagnostic Questionnaire for Fatigue Accumulation. Boruta method was applied to screen core variables from 20 occupational stress influencing factors. Dimensionality reduction of cumulative fatigue features was performed using a Bayesian sparse autoencoder (BASE), followed by comparison and application of clustering methods to achieve multi-class classification of the reduced features. Six machine learning classification models were then selected including logistic regression, support vector machine, decision tree, random forest, adaptive boosting, and lightweight gradient boosting machine (LightGBM) to construct and compare two occupational stress prediction models involving cumulative fatigue combined with ten other core variables: one utilizing the original binary cumulative fatigue label as a core variable, and another employing the novel fatigue classification proposed herein as a core variable. Results The positive rate of occupational stress among the manufacturing employees was 38.9%. The Boruta method identified 11 core variables of occupational stress: depressive symptoms, cumulative fatigue, age, length of service, tenure in current position, average weekly working hours, average daily overtime hours, average monthly income, low levels of exercise, life satisfaction, and sleep quality. The BSAE reduced the original cumulative fatigue factors to 12 dimensions, while the K-means clustering grouped the reduced fatigue features into three categories: no fatigue, moderate fatigue, and sever fatigue. Six occupational stress prediction models were constructed using the three-category labels of cumulative fatigue and ten additional factors. The results indicated that the LightGBM model performed best, achieving an area under the curve (AUC) of 0.78, an accuracy of 0.77, and an F1 score of 0.72. Compared with the model incorporating the traditional binary labels for cumulative fatigue, the best prediction AUC (0.72), accuracy (0.66), and F1 score (0.65) improved by 6%, 11%, and 7%, respectively. Conclusion Cumulative fatigue is a significant predictor of occupational stress among manufacturing employees. Applying data dimensionality reduction combined with three-class cluster analysis to characterize cumulative fatigue improves the performance of occupational stress prediction models. From a data-driven perspective, machine learning methods demonstrate strengths in processing complex datasets, capturing nonlinear relationships, and generating predictions based on key variables. This study therefore confirms that refined processing of core factors substantially enhances the predictive capability of machine learning models in assessing occupational stress risk in the manufacturing workforce.
