1.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
2.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
3.Non-Linear Association Between Physical Activities and Type 2Diabetes in 2.4 Million Korean Population, 2009–2022: A Nationwide Representative Study
Wonwoo JANG ; Seokjun KIM ; Yejun SON ; Soeun KIM ; Hayeon LEE ; Jaeyu PARK ; Kyeongmin LEE ; Jiseung KANG ; Damiano PIZZOL ; Jiyoung HWANG ; Sang Youl RHEE ; Dong Keon YON
Journal of Korean Medical Science 2025;40(12):e42-
Background:
Although excessive physical activity (PA) does not always confer additional health benefits, there is a paucity of studies that have quantitatively examined the doseresponse relationship between PA and type 2 diabetes. Therefore, this study investigated the relationship between the type 2 diabetes prevalence and intensity, frequency, and metabolic equivalent of task (MET) score of PA in a large population sample.
Methods:
We conducted a nationwide cross-sectional analysis examining sociodemographic variables, PA habits, and type 2 diabetes prevalence in 2,428,448 participants included in the Korea Community Health Survey. The non-linear association between MET score and odds ratios (ORs) for type 2 diabetes prevalence was plotted using a weighted generalized additive model. Categorical analysis was used to examine the joint association of moderate-intensity PA (MPA) and vigorous-intensity PA (VPA), and the influence of PA frequency.
Results:
MET score and diabetes prevalence revealed a non-linear association with the nadir at 1,028 MET-min/week, beyond which ORs increased with additional PA. Joint analysis of MPA and VPA showed the lowest OR of 0.79 (95% confidence interval, 0.75–0.84) for those engaging in 300–600 MET-min/week of MPA and > 600 MET-min/week of VPA concurrently, corresponding with World Health Organization recommendations. Additionally, both “weekend warriors” and “regularly active” individuals showed lower ORs compared to the inactive, although no significant difference was noted between the active groups.
Conclusion
In a large South Korean sample, higher PA is not always associated with a lower prevalence of type 2 diabetes, as the association follows a non-linear pattern; differences existed across sociodemographic variables. Considering the joint association, an adequate combination of MPA and VPA is recommended. The frequency of PA does not significantly influence the type 2 diabetes prevalence.
4.Non-Linear Association Between Physical Activities and Type 2Diabetes in 2.4 Million Korean Population, 2009–2022: A Nationwide Representative Study
Wonwoo JANG ; Seokjun KIM ; Yejun SON ; Soeun KIM ; Hayeon LEE ; Jaeyu PARK ; Kyeongmin LEE ; Jiseung KANG ; Damiano PIZZOL ; Jiyoung HWANG ; Sang Youl RHEE ; Dong Keon YON
Journal of Korean Medical Science 2025;40(12):e42-
Background:
Although excessive physical activity (PA) does not always confer additional health benefits, there is a paucity of studies that have quantitatively examined the doseresponse relationship between PA and type 2 diabetes. Therefore, this study investigated the relationship between the type 2 diabetes prevalence and intensity, frequency, and metabolic equivalent of task (MET) score of PA in a large population sample.
Methods:
We conducted a nationwide cross-sectional analysis examining sociodemographic variables, PA habits, and type 2 diabetes prevalence in 2,428,448 participants included in the Korea Community Health Survey. The non-linear association between MET score and odds ratios (ORs) for type 2 diabetes prevalence was plotted using a weighted generalized additive model. Categorical analysis was used to examine the joint association of moderate-intensity PA (MPA) and vigorous-intensity PA (VPA), and the influence of PA frequency.
Results:
MET score and diabetes prevalence revealed a non-linear association with the nadir at 1,028 MET-min/week, beyond which ORs increased with additional PA. Joint analysis of MPA and VPA showed the lowest OR of 0.79 (95% confidence interval, 0.75–0.84) for those engaging in 300–600 MET-min/week of MPA and > 600 MET-min/week of VPA concurrently, corresponding with World Health Organization recommendations. Additionally, both “weekend warriors” and “regularly active” individuals showed lower ORs compared to the inactive, although no significant difference was noted between the active groups.
Conclusion
In a large South Korean sample, higher PA is not always associated with a lower prevalence of type 2 diabetes, as the association follows a non-linear pattern; differences existed across sociodemographic variables. Considering the joint association, an adequate combination of MPA and VPA is recommended. The frequency of PA does not significantly influence the type 2 diabetes prevalence.
5.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
6.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
7.Non-Linear Association Between Physical Activities and Type 2Diabetes in 2.4 Million Korean Population, 2009–2022: A Nationwide Representative Study
Wonwoo JANG ; Seokjun KIM ; Yejun SON ; Soeun KIM ; Hayeon LEE ; Jaeyu PARK ; Kyeongmin LEE ; Jiseung KANG ; Damiano PIZZOL ; Jiyoung HWANG ; Sang Youl RHEE ; Dong Keon YON
Journal of Korean Medical Science 2025;40(12):e42-
Background:
Although excessive physical activity (PA) does not always confer additional health benefits, there is a paucity of studies that have quantitatively examined the doseresponse relationship between PA and type 2 diabetes. Therefore, this study investigated the relationship between the type 2 diabetes prevalence and intensity, frequency, and metabolic equivalent of task (MET) score of PA in a large population sample.
Methods:
We conducted a nationwide cross-sectional analysis examining sociodemographic variables, PA habits, and type 2 diabetes prevalence in 2,428,448 participants included in the Korea Community Health Survey. The non-linear association between MET score and odds ratios (ORs) for type 2 diabetes prevalence was plotted using a weighted generalized additive model. Categorical analysis was used to examine the joint association of moderate-intensity PA (MPA) and vigorous-intensity PA (VPA), and the influence of PA frequency.
Results:
MET score and diabetes prevalence revealed a non-linear association with the nadir at 1,028 MET-min/week, beyond which ORs increased with additional PA. Joint analysis of MPA and VPA showed the lowest OR of 0.79 (95% confidence interval, 0.75–0.84) for those engaging in 300–600 MET-min/week of MPA and > 600 MET-min/week of VPA concurrently, corresponding with World Health Organization recommendations. Additionally, both “weekend warriors” and “regularly active” individuals showed lower ORs compared to the inactive, although no significant difference was noted between the active groups.
Conclusion
In a large South Korean sample, higher PA is not always associated with a lower prevalence of type 2 diabetes, as the association follows a non-linear pattern; differences existed across sociodemographic variables. Considering the joint association, an adequate combination of MPA and VPA is recommended. The frequency of PA does not significantly influence the type 2 diabetes prevalence.
8.Non-Linear Association Between Physical Activities and Type 2Diabetes in 2.4 Million Korean Population, 2009–2022: A Nationwide Representative Study
Wonwoo JANG ; Seokjun KIM ; Yejun SON ; Soeun KIM ; Hayeon LEE ; Jaeyu PARK ; Kyeongmin LEE ; Jiseung KANG ; Damiano PIZZOL ; Jiyoung HWANG ; Sang Youl RHEE ; Dong Keon YON
Journal of Korean Medical Science 2025;40(12):e42-
Background:
Although excessive physical activity (PA) does not always confer additional health benefits, there is a paucity of studies that have quantitatively examined the doseresponse relationship between PA and type 2 diabetes. Therefore, this study investigated the relationship between the type 2 diabetes prevalence and intensity, frequency, and metabolic equivalent of task (MET) score of PA in a large population sample.
Methods:
We conducted a nationwide cross-sectional analysis examining sociodemographic variables, PA habits, and type 2 diabetes prevalence in 2,428,448 participants included in the Korea Community Health Survey. The non-linear association between MET score and odds ratios (ORs) for type 2 diabetes prevalence was plotted using a weighted generalized additive model. Categorical analysis was used to examine the joint association of moderate-intensity PA (MPA) and vigorous-intensity PA (VPA), and the influence of PA frequency.
Results:
MET score and diabetes prevalence revealed a non-linear association with the nadir at 1,028 MET-min/week, beyond which ORs increased with additional PA. Joint analysis of MPA and VPA showed the lowest OR of 0.79 (95% confidence interval, 0.75–0.84) for those engaging in 300–600 MET-min/week of MPA and > 600 MET-min/week of VPA concurrently, corresponding with World Health Organization recommendations. Additionally, both “weekend warriors” and “regularly active” individuals showed lower ORs compared to the inactive, although no significant difference was noted between the active groups.
Conclusion
In a large South Korean sample, higher PA is not always associated with a lower prevalence of type 2 diabetes, as the association follows a non-linear pattern; differences existed across sociodemographic variables. Considering the joint association, an adequate combination of MPA and VPA is recommended. The frequency of PA does not significantly influence the type 2 diabetes prevalence.
9.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
10.Long-term gastrointestinal and hepatobiliary outcomes of COVID-19: A multinational population-based cohort study from South Korea, Japan, and the UK
Kwanjoo LEE ; Jaeyu PARK ; Jinseok LEE ; Myeongcheol LEE ; Hyeon Jin KIM ; Yejun SON ; Sang Youl RHEE ; Lee SMITH ; Masoud RAHMATI ; Jiseung KANG ; Hayeon LEE ; Yeonjung HA ; Dong Keon YON
Clinical and Molecular Hepatology 2024;30(4):943-958
Background/Aims:
Considering emerging evidence on long COVID, comprehensive analyses of the post-acute complications of SARS-CoV-2 infection in the gastrointestinal and hepatobiliary systems are needed. We aimed to investigate the impact of COVID-19 on the long-term risk of gastrointestinal and hepatobiliary diseases and other digestive abnormalities.
Methods:
We used three large-scale population-based cohorts: the Korean cohort (discovery cohort), the Japanese cohort (validation cohort-A), and the UK Biobank (validation cohort-B). A total of 10,027,506 Korean, 12,218,680 Japanese, and 468,617 UK patients aged ≥20 years who had SARS-CoV-2 infection between 2020 and 2021 were matched to non-infected controls. Seventeen gastrointestinal and eight hepatobiliary outcomes as well as nine other digestive abnormalities following SARS-CoV-2 infection were identified and compared with controls.
Results:
The discovery cohort revealed heightened risks of gastrointestinal diseases (HR 1.15; 95% CI 1.08–1.22), hepatobiliary diseases (HR 1.30; 95% CI 1.09–1.55), and other digestive abnormalities (HR 1.05; 95% CI 1.01–1.10) beyond the first 30 days of infection, after exposure-driven propensity score-matching. The risk was pronounced according to the COVID-19 severity. The SARS-CoV-2 vaccination was found to lower the risk of gastrointestinal diseases but did not affect hepatobiliary diseases and other digestive disorders. The results derived from validation cohorts were consistent. The risk profile was most pronounced during the initial 3 months; however, it persisted for >6 months in validation cohorts, but not in the discovery cohort.
Conclusions
The incidence of gastrointestinal disease, hepatobiliary disease, and other digestive abnormalities increased in patients with SARS-CoV-2 infection during the post-acute phase.

Result Analysis
Print
Save
E-mail