1.Development and validation of risk assessment models for abnormal lung function in coal workers based on machine learning
Yaxin ZHU ; Keyun GUO ; Chen YANG ; Yixuan ZHANG ; Hao ZHU ; Yulan JIN
Chinese Journal of Industrial Hygiene and Occupational Diseases 2025;43(5):332-337
Objective:To analyze the factors influencing the lung function of coal miners, identify the optimal combination of indicators for evaluating lung function, develop a risk assessment model using machine learning, and offer personalized risk assessment for workers.Methods:In June 2023, through cluster sampling, male underground workers who participated in occupational health examinations at a coal mine in North China from July to August 2018 were selected as the research subjects. Their health examination results and occupational environmental data were collected. A total of 3, 320 coal miners were included. Randomly divide the research subjects into a training set (2324 people) and a validation set (996 people) in a ratio of 7∶3, and the balance of the two sets was tested. Perform LASSO regression analysis using R 4.2.2 software to select relevant important variables, and determine the model's input variables by combining them with relevant literature. Utilize Python 3.8 to construct logistic regression, random forest, support vector machine, and XG Boost models, assess the models' discriminative ability using metrics like accuracy, sensitivity, specificity, F1 score, ROC curve, and AUC, evaluate the models' calibration using Brier score, Log loss score, and calibration curve, and further analyze the clinical performance of the developed models through DCA decision curve analysis.Results:Among the 3 320 coal miners, 856 had abnormal lung function (25.78%). The XG Boost model was identified as the optimal model, achieving a training set accuracy of 87.39%, sensitivity of 86.60%, specificity of 87.67%, F1 score of 0.779, AUC of 0.945, Brier score of 0.071, Log loss of 0.267 and demonstrated good calibration curve consistency.Conclusion:The XG Boost model exhibits superior predictive performance compared to other models, and the model has high application value. The Shapley Additive Explanation (SHAP) method is employed for interpretation, making it a reliable basis for preventing abnormal lung function in coal miners.
2.Study on work-related musculoskeletal disorders and influencing factors of underground workers in a coal mine
Yaxin ZHU ; Kun SUN ; Yixuan ZHANG ; Chen YANG ; Keyun GUO ; Yulan JIN
Chinese Journal of Industrial Hygiene and Occupational Diseases 2025;43(8):600-605
Objective:To investigate the occurrence of work-related musculoskeletal disorders (WMSDs) among underground coal mine workers, identify the risk factors for WMSDs, and provide a scientific evidence for the prevention and treatment of WMSDs.Methods:In March 2024, through cluster sampling, the on-the-job workers who underwent questionnaire surveys and health examinations at a certain coal mine from July to August 2018 were selected as the research subjects. Basic information of employees, ergonomics-related characteristics, and the occurrence status of WMSDs in each part were collected, and multivariate logistic regression was used for analysis.Results:The incidence rate of WMSDs in at least one site among underground coal mine workers within the past year was 62.22% (219/352). The top three sites in sequence were the lower back (44.32%, 156/352), neck (26.14%, 92/352), and knee (26.14%, 92/352). Multivariate logistic regression analysis showed that frequently exerting great force with arms or hands during work ( OR=2.223, 95% CI: 1.022-4.836), prolonged static forward bending ( OR=1.544, 95% CI: 1.305-1.972), and frequently exerting great effort to operate tools or machines ( OR=2.206, 95% CI: 1.011-4.813), absence of external support systems ( OR=1.589, 95% CI: 1.349-1.996), and repetitive full-body twisting ( OR=1.523, 95% CI: 1.298-1.916) were all risk factors for the occurrence of WMSDs in the lower back ( P<0.05). Both night shift work ( OR=1.564, 95% CI: 1.339-1.939) and frequent forward neck flexion ( OR=1.532, 95% CI: 1.312-1.907) were all risk factors for the occurrence of WMSDs in the neck ( P<0.05). Lifting heavy objects above the shoulder ( OR=1.333, 95% CI: 1.142-1.782), uncomfortable posture and inability to exert force ( OR=1.873, 95% CI: 1.104-2.712), the use of vibration tools ( OR=2.958, 95% CI: 1.255-6.972), and length of service >10 years ( OR=1.525, 95% CI: 1.105-1.967) were all risk factors for the occurrence of WMSDs in the knee ( P<0.05) . Conclusion:The incidence of WMSDs among underground coal miners is relatively high, mainly concentrated in the lower back, neck and knee, and is related to factors such as poor working postures, and work organization. Coal mining enterprises should strengthen work organization, provide appropriate working equipment, and ensure reasonable distribution of workloads.
3.Study on risk prediction model of hypertension in steel workers
Keyun GUO ; Yaxin ZHU ; Yixuan ZHANG ; Chen YANG ; Hao ZHAO ; Yulan JIN
Chinese Journal of Industrial Hygiene and Occupational Diseases 2025;43(8):573-579
Objective:To identify risk factors influencing the incidence of hypertension among steelworkers (Homo sapiens) and establish an effective and easily implementable hypertension prediction model.Methods:In September 2023, 2214 steelworkers (Homo sapiens) were selected as study subjects. Basic demographic information, lifestyle, and occupational exposure data were collected, along with physiological measurements such as height, weight, and blood pressure. Multivariate unconditional logistic regression analysis was employed based on relevant literature to determine influencing factors for hypertension among steelworkers (Homo sapiens). Python 3.9 software was used to construct and compare logistic regression, support vector machine (SVM), random forest, extreme gradient boosting tree (XGBoost), and LGBM models. Model performance was evaluated using metrics such as receiver operating characteristic (ROC) curves, accuracy, calibration curves, and F1 scores. The Shapley Additive Explanations (SHAP) model was introduced for feature importance analysis to enhance the interpretability of the prediction model.Results:A total of 432 cases of hypertension were detected among 2214 study subjects, with a detection rate of 19.51%. Age, smoking status, salt intake, use of cooling equipment, carbon monoxide exposure, family history of hypertension, fasting blood glucose, triglycerides, and hemoglobin were identified as independent risk factors for hypertension ( P<0.05). A comparison of the five models revealed the following performance metrics: logistic regression achieved an accuracy of 0.853, F1 score of 0.680, Brier score of 0.108, and AUC of 0.907; SVM demonstrated an accuracy of 0.863, F1 score of 0.687, Brier score of 0.081, and AUC of 0.910; random forest showed an accuracy of 0.857, F1 score of 0.603, Brier score of 0.105, and AUC of 0.861; XGBoost yielded an accuracy of 0.850, F1 score of 0.684, Brier score of 0.117, and AUC of 0.899; and the LGBM model exhibited an accuracy of 0.838, F1 score of 0.625, Brier score of 0.112, and AUC of 0.870. Conclusion:The SVM model demonstrated strong predictive performance, effectively assessing the risk of hypertension among steelworkers (Homo sapiens) and facilitating targeted health management interventions.
4.Study on work-related musculoskeletal disorders and influencing factors of underground workers in a coal mine
Yaxin ZHU ; Kun SUN ; Yixuan ZHANG ; Chen YANG ; Keyun GUO ; Yulan JIN
Chinese Journal of Industrial Hygiene and Occupational Diseases 2025;43(8):600-605
Objective:To investigate the occurrence of work-related musculoskeletal disorders (WMSDs) among underground coal mine workers, identify the risk factors for WMSDs, and provide a scientific evidence for the prevention and treatment of WMSDs.Methods:In March 2024, through cluster sampling, the on-the-job workers who underwent questionnaire surveys and health examinations at a certain coal mine from July to August 2018 were selected as the research subjects. Basic information of employees, ergonomics-related characteristics, and the occurrence status of WMSDs in each part were collected, and multivariate logistic regression was used for analysis.Results:The incidence rate of WMSDs in at least one site among underground coal mine workers within the past year was 62.22% (219/352). The top three sites in sequence were the lower back (44.32%, 156/352), neck (26.14%, 92/352), and knee (26.14%, 92/352). Multivariate logistic regression analysis showed that frequently exerting great force with arms or hands during work ( OR=2.223, 95% CI: 1.022-4.836), prolonged static forward bending ( OR=1.544, 95% CI: 1.305-1.972), and frequently exerting great effort to operate tools or machines ( OR=2.206, 95% CI: 1.011-4.813), absence of external support systems ( OR=1.589, 95% CI: 1.349-1.996), and repetitive full-body twisting ( OR=1.523, 95% CI: 1.298-1.916) were all risk factors for the occurrence of WMSDs in the lower back ( P<0.05). Both night shift work ( OR=1.564, 95% CI: 1.339-1.939) and frequent forward neck flexion ( OR=1.532, 95% CI: 1.312-1.907) were all risk factors for the occurrence of WMSDs in the neck ( P<0.05). Lifting heavy objects above the shoulder ( OR=1.333, 95% CI: 1.142-1.782), uncomfortable posture and inability to exert force ( OR=1.873, 95% CI: 1.104-2.712), the use of vibration tools ( OR=2.958, 95% CI: 1.255-6.972), and length of service >10 years ( OR=1.525, 95% CI: 1.105-1.967) were all risk factors for the occurrence of WMSDs in the knee ( P<0.05) . Conclusion:The incidence of WMSDs among underground coal miners is relatively high, mainly concentrated in the lower back, neck and knee, and is related to factors such as poor working postures, and work organization. Coal mining enterprises should strengthen work organization, provide appropriate working equipment, and ensure reasonable distribution of workloads.
5.Study on risk prediction model of hypertension in steel workers
Keyun GUO ; Yaxin ZHU ; Yixuan ZHANG ; Chen YANG ; Hao ZHAO ; Yulan JIN
Chinese Journal of Industrial Hygiene and Occupational Diseases 2025;43(8):573-579
Objective:To identify risk factors influencing the incidence of hypertension among steelworkers (Homo sapiens) and establish an effective and easily implementable hypertension prediction model.Methods:In September 2023, 2214 steelworkers (Homo sapiens) were selected as study subjects. Basic demographic information, lifestyle, and occupational exposure data were collected, along with physiological measurements such as height, weight, and blood pressure. Multivariate unconditional logistic regression analysis was employed based on relevant literature to determine influencing factors for hypertension among steelworkers (Homo sapiens). Python 3.9 software was used to construct and compare logistic regression, support vector machine (SVM), random forest, extreme gradient boosting tree (XGBoost), and LGBM models. Model performance was evaluated using metrics such as receiver operating characteristic (ROC) curves, accuracy, calibration curves, and F1 scores. The Shapley Additive Explanations (SHAP) model was introduced for feature importance analysis to enhance the interpretability of the prediction model.Results:A total of 432 cases of hypertension were detected among 2214 study subjects, with a detection rate of 19.51%. Age, smoking status, salt intake, use of cooling equipment, carbon monoxide exposure, family history of hypertension, fasting blood glucose, triglycerides, and hemoglobin were identified as independent risk factors for hypertension ( P<0.05). A comparison of the five models revealed the following performance metrics: logistic regression achieved an accuracy of 0.853, F1 score of 0.680, Brier score of 0.108, and AUC of 0.907; SVM demonstrated an accuracy of 0.863, F1 score of 0.687, Brier score of 0.081, and AUC of 0.910; random forest showed an accuracy of 0.857, F1 score of 0.603, Brier score of 0.105, and AUC of 0.861; XGBoost yielded an accuracy of 0.850, F1 score of 0.684, Brier score of 0.117, and AUC of 0.899; and the LGBM model exhibited an accuracy of 0.838, F1 score of 0.625, Brier score of 0.112, and AUC of 0.870. Conclusion:The SVM model demonstrated strong predictive performance, effectively assessing the risk of hypertension among steelworkers (Homo sapiens) and facilitating targeted health management interventions.
6.Development and validation of risk assessment models for abnormal lung function in coal workers based on machine learning
Yaxin ZHU ; Keyun GUO ; Chen YANG ; Yixuan ZHANG ; Hao ZHU ; Yulan JIN
Chinese Journal of Industrial Hygiene and Occupational Diseases 2025;43(5):332-337
Objective:To analyze the factors influencing the lung function of coal miners, identify the optimal combination of indicators for evaluating lung function, develop a risk assessment model using machine learning, and offer personalized risk assessment for workers.Methods:In June 2023, through cluster sampling, male underground workers who participated in occupational health examinations at a coal mine in North China from July to August 2018 were selected as the research subjects. Their health examination results and occupational environmental data were collected. A total of 3, 320 coal miners were included. Randomly divide the research subjects into a training set (2324 people) and a validation set (996 people) in a ratio of 7∶3, and the balance of the two sets was tested. Perform LASSO regression analysis using R 4.2.2 software to select relevant important variables, and determine the model's input variables by combining them with relevant literature. Utilize Python 3.8 to construct logistic regression, random forest, support vector machine, and XG Boost models, assess the models' discriminative ability using metrics like accuracy, sensitivity, specificity, F1 score, ROC curve, and AUC, evaluate the models' calibration using Brier score, Log loss score, and calibration curve, and further analyze the clinical performance of the developed models through DCA decision curve analysis.Results:Among the 3 320 coal miners, 856 had abnormal lung function (25.78%). The XG Boost model was identified as the optimal model, achieving a training set accuracy of 87.39%, sensitivity of 86.60%, specificity of 87.67%, F1 score of 0.779, AUC of 0.945, Brier score of 0.071, Log loss of 0.267 and demonstrated good calibration curve consistency.Conclusion:The XG Boost model exhibits superior predictive performance compared to other models, and the model has high application value. The Shapley Additive Explanation (SHAP) method is employed for interpretation, making it a reliable basis for preventing abnormal lung function in coal miners.
7.Effects of daily mean temperature and other meteorological variables on bacillary dysentery in Beijing-Tianjin-Hebei region, China.
Qinxue CHANG ; Keyun WANG ; Honglu ZHANG ; Changping LI ; Yong WANG ; Huaiqi JING ; Shanshan LI ; Yuming GUO ; Zhuang CUI ; Wenyi ZHANG
Environmental Health and Preventive Medicine 2022;27(0):13-13
BACKGROUND:
Although previous studies have shown that meteorological factors such as temperature are related to the incidence of bacillary dysentery (BD), researches about the non-linear and interaction effect among meteorological variables remain limited. The objective of this study was to analyze the effects of temperature and other meteorological variables on BD in Beijing-Tianjin-Hebei region, which is a high-risk area for BD distribution.
METHODS:
Our study was based on the daily-scale data of BD cases and meteorological variables from 2014 to 2019, using generalized additive model (GAM) to explore the relationship between meteorological variables and BD cases and distributed lag non-linear model (DLNM) to analyze the lag and cumulative effects. The interaction effects and stratified analysis were developed by the GAM.
RESULTS:
A total of 147,001 cases were reported from 2014 to 2019. The relationship between temperature and BD was approximately liner above 0 °C, but the turning point of total temperature effect was 10 °C. Results of DLNM indicated that the effect of high temperature was significant on lag 5d and lag 6d, and the lag effect showed that each 5 °C rise caused a 3% [Relative risk (RR) = 1.03, 95% Confidence interval (CI): 1.02-1.05] increase in BD cases. The cumulative BD cases delayed by 7 days increased by 31% for each 5 °C rise in temperature above 10 °C (RR = 1.31, 95% CI: 1.30-1.33). The interaction effects and stratified analysis manifested that the incidence of BD was highest in hot and humid climates.
CONCLUSIONS
This study suggests that temperature can significantly affect the incidence of BD, and its effect can be enhanced by humidity and precipitation, which means that the hot and humid environment positively increases the incidence of BD.
Beijing/epidemiology*
;
China/epidemiology*
;
Dysentery, Bacillary/epidemiology*
;
Humans
;
Humidity
;
Temperature
8.A multi-center and retrospective analysis of missed diagnosis of colorectal polyps
Jinfeng WU ; Xiqiu YU ; Keyun CHEN ; Dongjun FAN ; Jianwei WU ; Yuqing GUO ; Xuming HUANG ; Guangchao YANG ; Jintao LIU
Chinese Journal of Digestive Endoscopy 2017;34(5):318-321
Objective To study the missed diagnosis of colorectal polyps during colonoscopy and its risk factors.Methods Data of 655 patients who underwent repeated co]onoscopy in 3 months (90 days) were analyzed in three endoscopy centers in Shenzhen.Miss rates of polyps and patients were calculated.Logistic regression analysis was used to identify the suspected risk factors associated with the miss rate including gender,age,symptoms of patient and number,shape,location of polyps.Results A total of 459 polyps(20.47%,459/2 242) in 224 patients(34.20%,204/655) were missed in overall 1 783 polyps within 655 patients.The patient miss rate increased with the polyp count increasing from 1 to 4,but with no significant differences.Polyp count of more than 5 was the independent risk factor for patient miss rate during colonoscopy(OR=4.98,P=0.00).Polyps in males were easier to be missed than those in females (OR =1.76,P =0.00).Size less than 5 mm was the independent risk factor for missed diagnosis during colonoscopy(OR=2.94,P=0.00).The flat type(Yamada Ⅰ,Ⅱ) was also the independent risk factor(OR=2.72,P=0.01;OR=3.23,P=0.00 respectively).Conclusion The miss rate of polyps is related to gender,basic polyp count,the size and shape of polyp.Male with multiple polyps and polyps with flat type and small size tend to be missed.

Result Analysis
Print
Save
E-mail