1.Occupation classification model based on DistilKoBERT: using the 5th and 6th Korean Working Condition Surveys
Tae-Yeon KIM ; Seong-Uk BAEK ; Myeong-Hun LIM ; Byungyoon YUN ; Domyung PAEK ; Kyung Ehi ZOH ; Kanwoo YOUN ; Yun Keun LEE ; Yangho KIM ; Jungwon KIM ; Eunsuk CHOI ; Mo-Yeol KANG ; YoonHo CHO ; Kyung-Eun LEE ; Juho SIM ; Juyeon OH ; Heejoo PARK ; Jian LEE ; Jong-Uk WON ; Yu-Min LEE ; Jin-Ha YOON
Annals of Occupational and Environmental Medicine 2024;36(1):e19-
Accurate occupation classification is essential in various fields, including policy development and epidemiological studies. This study aims to develop an occupation classification model based on DistilKoBERT. This study used data from the 5th and 6th Korean Working Conditions Surveys conducted in 2017 and 2020, respectively. A total of 99,665 survey participants, who were nationally representative of Korean workers, were included. We used natural language responses regarding their job responsibilities and occupational codes based on the Korean Standard Classification of Occupations (7th version, 3-digit codes). The dataset was randomly split into training and test datasets in a ratio of 7:3. The occupation classification model based on DistilKoBERT was fine-tuned using the training dataset, and the model was evaluated using the test dataset. The accuracy, precision, recall, and F1 score were calculated as evaluation metrics. The final model, which classified 28,996 survey participants in the test dataset into 142 occupational codes, exhibited an accuracy of 84.44%. For the evaluation metrics, the precision, recall, and F1 score of the model, calculated by weighting based on the sample size, were 0.83, 0.84, and 0.83, respectively. The model demonstrated high precision in the classification of service and sales workers yet exhibited low precision in the classification of managers. In addition, it displayed high precision in classifying occupations prominently represented in the training dataset. This study developed an occupation classification system based on DistilKoBERT, which demonstrated reasonable performance. Despite further efforts to enhance the classification accuracy, this automated occupation classification model holds promise for advancing epidemiological studies in the fields of occupational safety and health.
2.Occupation classification model based on DistilKoBERT: using the 5th and 6th Korean Working Condition Surveys
Tae-Yeon KIM ; Seong-Uk BAEK ; Myeong-Hun LIM ; Byungyoon YUN ; Domyung PAEK ; Kyung Ehi ZOH ; Kanwoo YOUN ; Yun Keun LEE ; Yangho KIM ; Jungwon KIM ; Eunsuk CHOI ; Mo-Yeol KANG ; YoonHo CHO ; Kyung-Eun LEE ; Juho SIM ; Juyeon OH ; Heejoo PARK ; Jian LEE ; Jong-Uk WON ; Yu-Min LEE ; Jin-Ha YOON
Annals of Occupational and Environmental Medicine 2024;36(1):e19-
Accurate occupation classification is essential in various fields, including policy development and epidemiological studies. This study aims to develop an occupation classification model based on DistilKoBERT. This study used data from the 5th and 6th Korean Working Conditions Surveys conducted in 2017 and 2020, respectively. A total of 99,665 survey participants, who were nationally representative of Korean workers, were included. We used natural language responses regarding their job responsibilities and occupational codes based on the Korean Standard Classification of Occupations (7th version, 3-digit codes). The dataset was randomly split into training and test datasets in a ratio of 7:3. The occupation classification model based on DistilKoBERT was fine-tuned using the training dataset, and the model was evaluated using the test dataset. The accuracy, precision, recall, and F1 score were calculated as evaluation metrics. The final model, which classified 28,996 survey participants in the test dataset into 142 occupational codes, exhibited an accuracy of 84.44%. For the evaluation metrics, the precision, recall, and F1 score of the model, calculated by weighting based on the sample size, were 0.83, 0.84, and 0.83, respectively. The model demonstrated high precision in the classification of service and sales workers yet exhibited low precision in the classification of managers. In addition, it displayed high precision in classifying occupations prominently represented in the training dataset. This study developed an occupation classification system based on DistilKoBERT, which demonstrated reasonable performance. Despite further efforts to enhance the classification accuracy, this automated occupation classification model holds promise for advancing epidemiological studies in the fields of occupational safety and health.
3.Occupation classification model based on DistilKoBERT: using the 5th and 6th Korean Working Condition Surveys
Tae-Yeon KIM ; Seong-Uk BAEK ; Myeong-Hun LIM ; Byungyoon YUN ; Domyung PAEK ; Kyung Ehi ZOH ; Kanwoo YOUN ; Yun Keun LEE ; Yangho KIM ; Jungwon KIM ; Eunsuk CHOI ; Mo-Yeol KANG ; YoonHo CHO ; Kyung-Eun LEE ; Juho SIM ; Juyeon OH ; Heejoo PARK ; Jian LEE ; Jong-Uk WON ; Yu-Min LEE ; Jin-Ha YOON
Annals of Occupational and Environmental Medicine 2024;36(1):e19-
Accurate occupation classification is essential in various fields, including policy development and epidemiological studies. This study aims to develop an occupation classification model based on DistilKoBERT. This study used data from the 5th and 6th Korean Working Conditions Surveys conducted in 2017 and 2020, respectively. A total of 99,665 survey participants, who were nationally representative of Korean workers, were included. We used natural language responses regarding their job responsibilities and occupational codes based on the Korean Standard Classification of Occupations (7th version, 3-digit codes). The dataset was randomly split into training and test datasets in a ratio of 7:3. The occupation classification model based on DistilKoBERT was fine-tuned using the training dataset, and the model was evaluated using the test dataset. The accuracy, precision, recall, and F1 score were calculated as evaluation metrics. The final model, which classified 28,996 survey participants in the test dataset into 142 occupational codes, exhibited an accuracy of 84.44%. For the evaluation metrics, the precision, recall, and F1 score of the model, calculated by weighting based on the sample size, were 0.83, 0.84, and 0.83, respectively. The model demonstrated high precision in the classification of service and sales workers yet exhibited low precision in the classification of managers. In addition, it displayed high precision in classifying occupations prominently represented in the training dataset. This study developed an occupation classification system based on DistilKoBERT, which demonstrated reasonable performance. Despite further efforts to enhance the classification accuracy, this automated occupation classification model holds promise for advancing epidemiological studies in the fields of occupational safety and health.
4.Social Inequities in the Survival of Liver Cancer: A Nationwide Cohort Study in Korea, 2007–2017
Mia SON ; Hye-Ri KIM ; Seung-Ah CHOE ; Seo-Young SONG ; Kyu-Hyoung LIM ; Myung KI ; Yeon Jeong HEO ; Minseo CHOI ; Seok-Ho GO ; Domyung PAEK
Journal of Korean Medical Science 2024;39(12):e130-
Background:
To analyze the effects of socioeconomic status (type of insurance and income level) and cancer stage on the survival of patients with liver cancer in Korea.
Methods:
A retrospective cohort study was constructed using data from the Healthcare Big Data Platform project in Korea between January 1, 2007, and December 31, 2017. A total of 143,511 patients in Korea diagnosed with liver cancer (International Classification of Diseases, 10th Revision [ICD-10] codes C22, C220, and C221) were followed for an average of 11 years. Of these, 110,443 died. The patient’s insurance type and income level were used as indicators of socioeconomic status. Unadjusted and adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated using a Cox proportional hazards regression model to analyze the relationship between the effects of sex, age, and cancer stage at first diagnosis (Surveillance, Epidemiology, and the End Results; SEER), type of insurance, and income level on the survival of patients with liver cancer. The interactive effects of the type of insurance, income level, and cancer stage on liver cancer death were also analyzed.
Results:
The lowest income group (medical aid) showed a higher risk for mortality (HR (95% CI); 1.37 (1.27–1.47) for all patients, 1.44 (1.32–1.57) for men, and 1.16 (1.01–1.34) for women) compared to the highest income group (1–6) among liver cancer (ICD-10 code C22) patients. The risk of liver cancer death was also higher in the lowest income group with a distant cancer stage (SEER = 7) diagnosis than for any other group.
Conclusion
Liver cancer patients with lower socioeconomic status and more severe cancer stages were at greater risk of death. Reducing social inequalities is needed to improve mortality rates among patients in lower social class groups who present with advanced cancer.
5.Widening Social Inequalities in Cancer Mortality of Children Under 5 Years in Korea
Mia SON ; Hye Ri KIM ; Seung-Ah CHOE ; Myung KI ; Fran YONG ; Mijin PARK ; Domyung PAEK
Journal of Korean Medical Science 2023;38(2):e20-
Background:
To investigate the effect of parental social class on cancer mortality in children under 5 in Korea, two birth cohorts were constructed by linking national birth data to under-5 death data from the Statistics Korea for 1995–1999 (3,323,613 births) and 2010–2014 (2,297,876 births).
Methods:
The Cox proportional hazards model adjusted for covariates was used in this study.
Results:
Social inequalities of under-5 cancer mortality risk in paternal education and paternal employment status were greater in 2010–2014 than in 1995–1999. The gap of hazard ratio (HR) of under-5 cancer mortality between lower (high school or below) and higher (university or higher) paternal education increased from 1.23 (95% confidence interval, 1.041.46) in 1995–1999 to 1.45 (1.11–1.97) in 2010–2014; the gap of HR between parents engaged in manual work and non-manual work increased from 1.32 (1.12–1.56) in 1995–1999 to 1.45 (1.12–1.89) in 2010–2014 for fathers, and from 1.18 (0.7–1.98) to 1.69 (1.03–2.79) for mothers. When the parental social class was lower, the risk of under-5 cancer mortality was higher in not only adverse but normal births.
Conclusion
Social inequalities must be addressed to reduce the disparity in cancer mortality of children under 5 years old.
6.Income Disparity in Breast Cancer Incidence and Stage at Presentation:A National Population Study of South Korea
Seung-Ah CHOE ; Minji ROH ; Hye Ri KIM ; Soohyeon LEE ; Myung KI ; Domyung PAEK ; Mia SON
Journal of Breast Cancer 2022;25(5):415-424
Purpose:
This study aims to explore income-based disparities in breast cancer (BC) incidence and stage at presentation in a national population in South Korea, where a National Cancer Screening Program (NCSP) has been implemented.
Methods:
In 2007, new patients with BC were identified using the Korea Central Cancer Registry database. We calculated adjusted odds ratios (aORs) to evaluate the association between individual income level and the risk of distant stage BC at presentation, adjusting for women’s age, body mass index, disability registration, employment, region of residence, and year of diagnosis.
Results:
The cumulative age-standardized incidence of BC in the 11 years was highest among women in the richest quintile (2,040 per 100,000 women for 11 years), whereas the proportion of distant stage at presentation was the highest (10.2%) among the medical aid beneficiaries. The aOR of distant stage diagnosis at presentation was higher for lowerincome quintiles, and the risk was the highest in the medical aid beneficiaries (aOR, 2.25;95% confidence interval, 1.97–2.58) than in the richest quintile. The income-based gradient in aORs for distant stage did not differ between younger (< 40 years) and older patients.
Conclusion
A higher risk of distant stage BC at presentation among the lower-income and medical aid groups in the context of a NCSP was observed. A more focused approach toward women in lower-income groups is necessary to alleviate the disparity in the risk of advanced BC.
7.Association of discrimination and presenteeism with cardiovascular disease: the Fourth Korean Working Conditions Survey
Kyusung KIM ; Sung il CHO ; Domyung PAEK
Annals of Occupational and Environmental Medicine 2019;31(1):e28-
BACKGROUND: Discrimination is a representative social determinant of health. Presenteeism is defined as presenting to work despite of illness and is an indicator of group health. We investigated the association of discrimination and presenteeism with cardiovascular disease using Korean data. METHODS: This study used the fourth Korea Working Conditions Survey (2014) data of 27,662 wage workers (employees). Presenteeism and discrimination related to age, sex, education, birth region, and employment type were ascertained. Self-reported cardiovascular disease was also assessed using the survey questionnaire. General and occupational characteristics found to be significant in univariate analyses were entered into a multivariate logistic regression analysis of the association of discrimination and presenteeism with cardiovascular disease. We also calculated the odds ratios of multiple discriminations and/or presenteeism for cardiovascular disease. RESULTS: In the univariate analyses, sex, age, education, monthly income, employment type, occupation, hours worked per week, workplace scale, and shift work were significantly associated with cardiovascular disease. A multivariate logistic regression analysis adjusted for general and occupational characteristics showed that discrimination and presenteeism were significantly associated with cardiovascular disease. Finally, the association with cardiovascular disease was strongest when both multiple discriminations and presenteeism were present. CONCLUSIONS: Discrimination and presenteeism are associated with cardiovascular disease, and this association was stronger in the presence of multiple types of discrimination and presenteeism.
Cardiovascular Diseases
;
Discrimination (Psychology)
;
Education
;
Employment
;
Korea
;
Logistic Models
;
Occupations
;
Odds Ratio
;
Parturition
;
Presenteeism
;
Salaries and Fringe Benefits
8.Association between organizational justice and depressive symptoms among securities company workers
HyunSuk LEE ; KangHyun UM ; YoungSu JU ; Sukkoun LEE ; Min CHOI ; Domyung PAEK ; Seong Sik CHO
Annals of Occupational and Environmental Medicine 2019;31(1):e7-
BACKGROUND: The organizational justice model can evaluate job stressor from decision-making process, attitude of managerial or senior staff toward their junior workers, and unfair resource distribution. Stress from organizational injustice could be harmful to workers' mental health. The purpose of this study is to explore the association between organizational justice and depressive symptoms in a securities company. METHODS: To estimate organizational justice, a translated Moorman's organizational justice evaluation questionnaire (Korean) was employed. Cronbach's α coefficient was estimated to assess the internal consistency of the translated questionnaire. To assess depressive symptoms, the Center for Epidemiologic Studies Depression (CES-D) scale was used. The link between the sub-concepts of the organizational justice model and depressive symptoms was assessed utilizing multiple logistic regression models. RESULTS: The risk of depressive symptoms was significantly higher among workers with higher levels of all subcategory of organizational injustice. In the full adjusted model odds ratio (OR) of higher level of procedural injustice 2.79 (95% confidence interval [CI], 1.58–4.90), OR of the higher level of relational injustice 4.25 (95% CI, 2.66–6.78), OR of higher level of distributional injustice 4.53 (95% CI, 2.63–7.83) respectively. Cronbach's α coefficient of the Korean version was 0.93 for procedural justice, 0.93 for relational justice, and 0.95 for distributive justice. CONCLUSIONS: A higher level of organizational injustice was linked to higher prevalence of depressive symptoms among workers in a company of financial industry.
Depression
;
Epidemiologic Studies
;
Logistic Models
;
Mental Health
;
Odds Ratio
;
Prevalence
;
Social Justice
9.Health effects of environmental pollution in population living near industrial complex areas in Korea.
Sang Yong EOM ; Jonghyuk CHOI ; Sanghyuk BAE ; Ji Ae LIM ; Guen Bae KIM ; Seung Do YU ; Yangho KIM ; Hyun Sul LIM ; Bu Soon SON ; Domyung PAEK ; Yong Dae KIM ; Heon KIM ; Mina HA ; Ho Jang KWON
Environmental Health and Toxicology 2018;33(1):e2018004-
Several epidemiological studies have reported an association between environmental pollution and various health conditions in individuals residing in industrial complexes. To evaluate the effects of pollution from industrial complex on human health, we performed a pooled analysis of environmental epidemiologic monitoring data for residents living near national industrial complexes in Korea. The respiratory and allergic symptoms and the prevalence of acute and chronic diseases, including cancer, were used as the outcome variables for health effects. Multiple logistic regression analysis was used to analyze the relationship between exposure to pollution from industrial complexes and health conditions. After adjusting for age, sex, smoking status, occupational exposure, level of education, and body mass index, the residents near the industrial complexes were found to have more respiratory symptoms, such as cough (odds ratio [OR], 1.18; 95% confidence interval [CI], 1.06 to 1.31) and sputum production (OR, 1.13; 95% CI, 1.03 to 1.24), and symptoms of atopic dermatitis (OR, 1.10; 95% CI, 1.01 to 1.20). Among residents of the industrial complexes, the prevalence of acute eye disorders was approximately 40% higher (OR, 1.39; 95% CI, 1.04 to 1.84) and the prevalence of lung and uterine cancer was 3.45 times and 1.88 times higher, respectively, than those among residents of the control area. This study showed that residents living in the vicinity of industrial complexes have a high risk of acute and chronic diseases including respiratory and allergic conditions. These results can be used as basic objective data for developing health management measures for individuals residing near industrial complexes.
Body Mass Index
;
Chronic Disease
;
Cough
;
Dermatitis, Atopic
;
Education
;
Employment
;
Environmental Pollution*
;
Epidemiologic Studies
;
Epidemiological Monitoring
;
Humans
;
Korea*
;
Logistic Models
;
Lung
;
Prevalence
;
Smoke
;
Smoking
;
Sputum
;
Uterine Neoplasms
10.Health effects of environmental pollution in population living near industrial complex areas in Korea
Sang Yong EOM ; Jonghyuk CHOI ; Sanghyuk BAE ; Ji Ae LIM ; Guen Bae KIM ; Seung Do YU ; Yangho KIM ; Hyun Sul LIM ; Bu Soon SON ; Domyung PAEK ; Yong Dae KIM ; Heon KIM ; Mina HA ; Ho Jang KWON
Environmental Health and Toxicology 2018;33(1):2018004-
Several epidemiological studies have reported an association between environmental pollution and various health conditions in individuals residing in industrial complexes. To evaluate the effects of pollution from industrial complex on human health, we performed a pooled analysis of environmental epidemiologic monitoring data for residents living near national industrial complexes in Korea. The respiratory and allergic symptoms and the prevalence of acute and chronic diseases, including cancer, were used as the outcome variables for health effects. Multiple logistic regression analysis was used to analyze the relationship between exposure to pollution from industrial complexes and health conditions. After adjusting for age, sex, smoking status, occupational exposure, level of education, and body mass index, the residents near the industrial complexes were found to have more respiratory symptoms, such as cough (odds ratio [OR], 1.18; 95% confidence interval [CI], 1.06 to 1.31) and sputum production (OR, 1.13; 95% CI, 1.03 to 1.24), and symptoms of atopic dermatitis (OR, 1.10; 95% CI, 1.01 to 1.20). Among residents of the industrial complexes, the prevalence of acute eye disorders was approximately 40% higher (OR, 1.39; 95% CI, 1.04 to 1.84) and the prevalence of lung and uterine cancer was 3.45 times and 1.88 times higher, respectively, than those among residents of the control area. This study showed that residents living in the vicinity of industrial complexes have a high risk of acute and chronic diseases including respiratory and allergic conditions. These results can be used as basic objective data for developing health management measures for individuals residing near industrial complexes.
Body Mass Index
;
Chronic Disease
;
Cough
;
Dermatitis, Atopic
;
Education
;
Employment
;
Environmental Pollution
;
Epidemiologic Studies
;
Epidemiological Monitoring
;
Humans
;
Korea
;
Logistic Models
;
Lung
;
Prevalence
;
Smoke
;
Smoking
;
Sputum
;
Uterine Neoplasms

Result Analysis
Print
Save
E-mail