1.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
2.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
3.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
4.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
5.Synthetic data production for biomedical research
Yun Gyeong LEE ; Mi-Sook KWAK ; Jeong Eun KIM ; Min Sun KIM ; Dong Un NO ; Hee Youl CHAI
Osong Public Health and Research Perspectives 2025;16(2):94-99
Synthetic data, generated using advanced artificial intelligence (AI) techniques, replicates the statistical properties of real-world datasets while excluding identifiable information.Although synthetic data does not consist of actual data points, it is derived from original datasets, thereby enabling analyses that yield results comparable to those obtained with real data. Synthetic datasets are evaluated based on their utility—a measure of how effectively they mirror real data for analytical purposes. This paper presents the generation of synthetic datasets through the Healthcare Big Data Showcase Project (2019–2023). The original dataset comprises comprehensive multi-omics data from 400 individuals, including cancer survivors, chronic disease patients, and healthy participants. Synthetic data facilitates efficient access and robust analyses, serving as a practical tool for research and education. It addresses privacy concerns, supports AI research, and provides a foundation for innovative applications across diverse fields, such as public health and precision medicine.
6.Lazertinib versus Gefitinib as First-Line Treatment for EGFR-mutated Locally Advanced or Metastatic NSCLC: LASER301 Korean Subset
Ki Hyeong LEE ; Byoung Chul CHO ; Myung-Ju AHN ; Yun-Gyoo LEE ; Youngjoo LEE ; Jong-Seok LEE ; Joo-Hang KIM ; Young Joo MIN ; Gyeong-Won LEE ; Sung Sook LEE ; Kyung-Hee LEE ; Yoon Ho KO ; Byoung Yong SHIM ; Sang-We KIM ; Sang Won SHIN ; Jin-Hyuk CHOI ; Dong-Wan KIM ; Eun Kyung CHO ; Keon Uk PARK ; Jin-Soo KIM ; Sang Hoon CHUN ; Jangyoung WANG ; SeokYoung CHOI ; Jin Hyoung KANG
Cancer Research and Treatment 2024;56(1):48-60
Purpose:
This subgroup analysis of the Korean subset of patients in the phase 3 LASER301 trial evaluated the efficacy and safety of lazertinib versus gefitinib as first-line therapy for epidermal growth factor receptor mutated (EGFRm) non–small cell lung cancer (NSCLC).
Materials and Methods:
Patients with locally advanced or metastatic EGFRm NSCLC were randomized 1:1 to lazertinib (240 mg/day) or gefitinib (250 mg/day). The primary endpoint was investigator-assessed progression-free survival (PFS).
Results:
In total, 172 Korean patients were enrolled (lazertinib, n=87; gefitinib, n=85). Baseline characteristics were balanced between the treatment groups. One-third of patients had brain metastases (BM) at baseline. Median PFS was 20.8 months (95% confidence interval [CI], 16.7 to 26.1) for lazertinib and 9.6 months (95% CI, 8.2 to 12.3) for gefitinib (hazard ratio [HR], 0.41; 95% CI, 0.28 to 0.60). This was supported by PFS analysis based on blinded independent central review. Significant PFS benefit with lazertinib was consistently observed across predefined subgroups, including patients with BM (HR, 0.28; 95% CI, 0.15 to 0.53) and those with L858R mutations (HR, 0.36; 95% CI, 0.20 to 0.63). Lazertinib safety data were consistent with its previously reported safety profile. Common adverse events (AEs) in both groups included rash, pruritus, and diarrhoea. Numerically fewer severe AEs and severe treatment–related AEs occurred with lazertinib than gefitinib.
Conclusion
Consistent with results for the overall LASER301 population, this analysis showed significant PFS benefit with lazertinib versus gefitinib with comparable safety in Korean patients with untreated EGFRm NSCLC, supporting lazertinib as a new potential treatment option for this patient population.
7.Clinical Practice Recommendations for the Use of Next-Generation Sequencing in Patients with Solid Cancer: A Joint Report from KSMO and KSP
Miso KIM ; Hyo Sup SHIM ; Sheehyun KIM ; In Hee LEE ; Jihun KIM ; Shinkyo YOON ; Hyung-Don KIM ; Inkeun PARK ; Jae Ho JEONG ; Changhoon YOO ; Jaekyung CHEON ; In-Ho KIM ; Jieun LEE ; Sook Hee HONG ; Sehhoon PARK ; Hyun Ae JUNG ; Jin Won KIM ; Han Jo KIM ; Yongjun CHA ; Sun Min LIM ; Han Sang KIM ; Choong-kun LEE ; Jee Hung KIM ; Sang Hoon CHUN ; Jina YUN ; So Yeon PARK ; Hye Seung LEE ; Yong Mee CHO ; Soo Jeong NAM ; Kiyong NA ; Sun Och YOON ; Ahwon LEE ; Kee-Taek JANG ; Hongseok YUN ; Sungyoung LEE ; Jee Hyun KIM ; Wan-Seop KIM
Cancer Research and Treatment 2024;56(3):721-742
In recent years, next-generation sequencing (NGS)–based genetic testing has become crucial in cancer care. While its primary objective is to identify actionable genetic alterations to guide treatment decisions, its scope has broadened to encompass aiding in pathological diagnosis and exploring resistance mechanisms. With the ongoing expansion in NGS application and reliance, a compelling necessity arises for expert consensus on its application in solid cancers. To address this demand, the forthcoming recommendations not only provide pragmatic guidance for the clinical use of NGS but also systematically classify actionable genes based on specific cancer types. Additionally, these recommendations will incorporate expert perspectives on crucial biomarkers, ensuring informed decisions regarding circulating tumor DNA panel testing.
8.A 10-Gene Signature to Predict the Prognosis of Early-Stage Triple-Negative Breast Cancer
Chang Min KIM ; Kyong Hwa PARK ; Yun Suk YU ; Ju Won KIM ; Jin Young PARK ; Kyunghee PARK ; Jong-Han YU ; Jeong Eon LEE ; Sung Hoon SIM ; Bo Kyoung SEO ; Jin Kyeoung KIM ; Eun Sook LEE ; Yeon Hee PARK ; Sun-Young KONG
Cancer Research and Treatment 2024;56(4):1113-1125
Purpose:
Triple-negative breast cancer (TNBC) is a particularly challenging subtype of breast cancer, with a poorer prognosis compared to other subtypes. Unfortunately, unlike luminal-type cancers, there is no validated biomarker to predict the prognosis of patients with early-stage TNBC. Accurate biomarkers are needed to establish effective therapeutic strategies.
Materials and Methods:
In this study, we analyzed gene expression profiles of tumor samples from 184 TNBC patients (training cohort, n=76; validation cohort, n=108) using RNA sequencing.
Results:
By combining weighted gene expression, we identified a 10-gene signature (DGKH, GADD45B, KLF7, LYST, NR6A1, PYCARD, ROBO1, SLC22A20P, SLC24A3, and SLC45A4) that stratified patients by risk score with high sensitivity (92.31%), specificity (92.06%), and accuracy (92.11%) for invasive disease-free survival. The 10-gene signature was validated in a separate institution cohort and supported by meta-analysis for biological relevance to well-known driving pathways in TNBC. Furthermore, the 10-gene signature was the only independent factor for invasive disease-free survival in multivariate analysis when compared to other potential biomarkers of TNBC molecular subtypes and T-cell receptor β diversity. 10-gene signature also further categorized patients classified as molecular subtypes according to risk scores.
Conclusion
Our novel findings may help address the prognostic challenges in TNBC and the 10-gene signature could serve as a novel biomarker for risk-based patient care.
9.Extrahepatic malignancies and antiviral drugs for chronic hepatitis B: A nationwide cohort study
Moon Haeng HUR ; Dong Hyeon LEE ; Jeong-Hoon LEE ; Mi-Sook KIM ; Jeayeon PARK ; Hyunjae SHIN ; Sung Won CHUNG ; Hee Jin CHO ; Min Kyung PARK ; Heejoon JANG ; Yun Bin LEE ; Su Jong YU ; Sang Hyub LEE ; Yong Jin JUNG ; Yoon Jun KIM ; Jung-Hwan YOON
Clinical and Molecular Hepatology 2024;30(3):500-514
Background/Aims:
Chronic hepatitis B (CHB) is related to an increased risk of extrahepatic malignancy (EHM), and antiviral treatment is associated with an incidence of EHM comparable to controls. We compared the risks of EHM and intrahepatic malignancy (IHM) between entecavir (ETV) and tenofovir disoproxil fumarate (TDF) treatment.
Methods:
Using data from the National Health Insurance Service of Korea, this nationwide cohort study included treatment-naïve CHB patients who initiated ETV (n=24,287) or TDF (n=29,199) therapy between 2012 and 2014. The primary outcome was the development of any primary EHM. Secondary outcomes included overall IHM development. E-value was calculated to assess the robustness of results to unmeasured confounders.
Results:
The median follow-up duration was 5.9 years, and all baseline characteristics were well balanced after propensity score matching. EHM incidence rate differed significantly between within versus beyond 3 years in both groups (P<0.01, Davies test). During the first 3 years, EHM risk was comparable in the propensity score-matched cohort (5.88 versus 5.84/1,000 person-years; subdistribution hazard ratio [SHR]=1.01, 95% confidence interval [CI]=0.88–1.17, P=0.84). After year 3, however, TDF was associated with a significantly lower EHM incidence compared to ETV (4.92 versus 6.91/1,000 person-years; SHR=0.70, 95% CI=0.60–0.81, P<0.01; E-value for SHR=2.21). Regarding IHM, the superiority of TDF over ETV was maintained both within (17.58 versus 20.19/1,000 person-years; SHR=0.88, 95% CI=0.81–0.95, P<0.01) and after year 3 (11.45 versus 16.20/1,000 person-years; SHR=0.68, 95% CI=0.62–0.75, P<0.01; E-value for SHR=2.30).
Conclusions
TDF was associated with approximately 30% lower risks of both EHM and IHM than ETV in CHB patients after 3 years of antiviral therapy.
10.Difference in Baseline Antimicrobial Prescription Patterns of Hospitals According to Participation in the National Antimicrobial Monitoring and Feedback System in Korea
Jihye SHIN ; Ji Young PARK ; Jungmi CHAE ; Hyung-Sook KIM ; Song Mi MOON ; Eunjeong HEO ; Se Yoon PARK ; Dong Min SEO ; Ha-Jin CHUN ; Yong Chan KIM ; Myung Jin LEE ; Kyungmin HUH ; Hyo Jung PARK ; I Ji YUN ; Su Jin JEONG ; Jun Yong CHOI ; Dong-Sook KIM ; Bongyoung KIM ;
Journal of Korean Medical Science 2024;39(29):e216-
This study aimed to evaluate the differences in the baseline characteristics and patterns of antibiotic usage among hospitals based on their participation in the Korea National Antimicrobial Use Analysis System (KONAS). We obtained claims data from the National Health Insurance for inpatients admitted to all secondary- and tertiary-care hospitals between January 2020 and December 2021 in Korea. 15.9% (58/395) of hospitals were KONAS participants, among which the proportion of hospitals with > 900 beds (31.0% vs.2.6%, P < 0.001) and tertiary care (50.0% vs. 5.2%, P < 0.001) was higher than that among non-participants. The consumption of antibiotics targeting antimicrobial-resistant gram positive bacteria (33.7 vs. 27.1 days of therapy [DOT]/1,000 patient-days, P = 0.019) and antibiotics predominantly used for resistant gram-negative bacteria (4.8 vs. 3.7 DOT/1,000 patient-days, P = 0.034) was higher in KONAS-participating versus -non-participating hospitals. The current KONAS data do not fully represent all secondary- and tertiary-care hospitals in Korea; thus, the KONAS results should be interpreted with caution.

Result Analysis
Print
Save
E-mail