1.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.
2.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.
3.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.
4.Poly(rC) binding protein 2 acts as a negative regulator of IRES-mediated translation of Hr mRNA
Jeong Ki KIM ; Injung KIM ; Keonwoo CHOI ; Jee Hyun CHOI ; Eunmin KIM ; Hwa Young LEE ; Jongkeun PARK ; Sungjoo KIM YOON
Experimental & Molecular Medicine 2018;50(2):e441-
During the hair follicle (HF) cycle, HR protein expression is not concordant with the presence of the Hr mRNA transcript, suggesting an elaborate regulation of Hr gene expression. Here we present evidence that the 5′ untranslated region (UTR) of the Hr gene has internal ribosome entry site (IRES) activity and this activity is regulated by the binding of poly (rC) binding protein 2 (PCBP2) to Hr mRNA. Overexpression and knockdown of PCBP2 resulted in a decrease in Hr 5′ UTR IRES activity and an increase in HR protein expression without changing mRNA levels. We also found that this regulation was disrupted in a mutant Hr 5′ UTR that has a mutation responsible for Marie Unna hereditary hypotrichosis (MUHH) in both mice and humans. These findings suggest that Hr mRNA expression is regulated at the post-transcriptional level via IRES-mediated translation control through interaction with PCPB2, but not in MUHH.
5.The Impact of CDH13 Polymorphism and Statin Administration on TG/HDL Ratio in Cardiovascular Patients.
Jung Ran CHOI ; Yangsoo JANG ; Sungjoo KIM YOON ; Jong Keun PARK ; Sungbin Richard SORN ; Mi Young PARK ; Myoungsook LEE
Yonsei Medical Journal 2015;56(6):1604-1612
PURPOSE: Adiponectin is expressed in adipose tissue, and is affected by smoking, obesity, and genetic factors, such as CDH13 polymorphism, contributing to the development of coronary vascular diseases (CVDs). MATERIALS AND METHODS: We investigated the effect of genetic variations of CDH13 (rs3865188) on blood chemistry and adiponectin levels in 345 CVD patients undergoing statin-free or statin treatment. RESULTS: Genetic variation in CDH13 was significantly correlated with several clinical factors, including adiponectin, diastolic blood pressure, triglyceride (TG), and insulin levels. Subjects with the T allele (mutant form) had significantly lower adiponectin levels than those with the A allele. Total cholesterol (TC), low-density lipoprotein cholesterol (LDLc), TG/high-density lipoprotein cho-lesterol (HDLc) ratio, and HDL3b subtype were markedly decreased in statin treated subjects regardless of having the A or T allele. TG and TG/HDL in the statin-free group with TT genotype of the rs3865188 was higher than in the others but they were not different in the statin-treated subjects. We observed a significant difference in adiponectin levels between patients with the A and T alleles in the statin-free group; meanwhile, no difference in adiponectin levels was noted in the statin group. Plasma levels of other cytokines, leptin, visfatin, interleukin-6 (IL-6), and tumor necrosis factor-alpha (TNF-alpha), were not different among the CDH13 genotypes according to statin administration. Body mass index (BMI), TG, insulin, HDL3b, and TG/HDL ratio showed negative correlations with adiponectin levels. CONCLUSION: Plasma adiponectin levels and TG/HDL ratio were significantly different according to variants of CDH13 and statin administration in Korean patients with CVD.
Adiponectin/blood/*genetics
;
Adult
;
Aged
;
Alleles
;
Blood Pressure/genetics
;
Body Mass Index
;
Cadherins/blood/*genetics
;
Cholesterol
;
Cholesterol, LDL
;
Female
;
Genotype
;
Humans
;
Hydroxymethylglutaryl-CoA Reductase Inhibitors/*therapeutic use
;
Insulin
;
Interleukin-6
;
Leptin/genetics
;
Lipoproteins, HDL/genetics
;
Male
;
Middle Aged
;
Obesity/blood
;
Polymorphism, Genetic
;
Triglycerides/genetics
;
Tumor Necrosis Factor-alpha/genetics
;
Vascular Diseases/*drug therapy
6.Single Nucleotide Deletion Mutation of KCNH2 Gene is Responsible for LQT Syndrome in a 3-Generation Korean Family.
Jong Keun PARK ; Yong Seog OH ; Jee Hyun CHOI ; Sungjoo Kim YOON
Journal of Korean Medical Science 2013;28(9):1388-1393
Long QT syndrome (LQTS) is characterized by the prolongation of the QT interval in ECG and manifests predisposition to life threatening arrhythmia which often leads to sudden cardiac death. We encountered a 3-generation family with 5 affected family members in which LQTS was inherited in autosomal dominant manner. The LQTS is considered an ion channel disorder in which the type and location of the genetic mutation determines to a large extent the expression of the clinical syndrome. Upon screening of the genomic sequences of cardiac potassium ion channel genes, we found a single nucleotide C deletion mutation in the exon 3 of KCNH2 gene that co-segregates with the LQTS in this family. This mutation presumably resulted in a frameshift mutation, P151fs+15X. This study added a new genetic cause to the pool of mutations that lead to defected potassium ion channels in the heart.
Adolescent
;
Adult
;
Aged
;
Aged, 80 and over
;
Asian Continental Ancestry Group/*genetics
;
DNA Mutational Analysis
;
Ether-A-Go-Go Potassium Channels/*genetics
;
Exons
;
Female
;
Frameshift Mutation
;
Genotype
;
Humans
;
Long QT Syndrome/*diagnosis/genetics
;
Male
;
Middle Aged
;
Pedigree
;
Republic of Korea
;
Sequence Deletion
7.The Hairless Gene: A Putative Navigator of Hair Follicle Development.
Jeong Ki KIM ; Bong Kyu KIM ; Jong Keun PARK ; Jee Hyun CHOI ; Sungjoo KIM YOON
Genomics & Informatics 2011;9(3):93-101
The Hairless (HR) gene regulates the expression of several target genes as a transcriptional corepressor of nuclear receptors. The hair follicle (HF), a small independent organ of the skin, resides in the epidermis and undergoes regenerative cycling for normal hair formation. HF development requires many genes and signaling pathways to function properly in time and space, one of them being the HR gene. Various mutations of the HR gene have been reported to cause the hair loss phenotype in rodents and humans. In recent studies, it has been suggested that the HR gene is a critical player in the regulation of the hair cycle and, thus, HF development. Furthermore, the HR gene is associated with the Wnt signaling pathway, which regulates roliferation and differentiation of cells and plays an essential role in hair and skin development. In this review, we summarize the mutations responsible for human hair disorders and discuss the roles of the HR gene in HF development.
Epidermis
;
Hair
;
Hair Follicle
;
Humans
;
Phenotype
;
Receptors, Cytoplasmic and Nuclear
;
Rodentia
;
Skin
;
Wnt Signaling Pathway
8.A clinical study of ectopic pregnancy during recent 8 years.
Sungho PARK ; Yonsik NA ; Jiyoon JUNG ; Seongcheon YANG ; Suran CHOI ; Sungjoo KIM ; Pong Rheem JANG ; Yong Il KWON
Korean Journal of Obstetrics and Gynecology 2009;52(2):245-252
OBJECTIVE: The study was designed to ascertain a proper method of early diagnosis and treatment of ectopic pregnancy by analyzing its clinical and epidemiological characteristics. METHODS: The medical records of patients who were diagnosed to ectopic pregnancy at Hallym medical center during the period from January 1, 2000 to December 31, 2007 have been reviewed. RESULTS: The incidence of ectopic pregnancy was 7.3% (1,067) out of 14,519 deliveries. The most frequent age group was 26~30 (29.5%). Risk factors they had were previous histories of abdominal or pelvic surgery (37.0%), artificial abortion (30.8%), pelvic inflammatory disease (12%), and tubal sterilization (9.6%). Most frequent clinical symptoms were amenorrhea (88.7%), lower abdominal pain (81.2%), and vaginal spotting (60.0%). Percentage of patients with hemoglobin level over 10.0 gm/dL was 79% and below 8.0 gm/dL 3.9%. The clinical symptoms of ectopic pregnancy most commonly occurred after 6~8 weeks from last menstrual period (47%). Ectopic gestation was implanted on the fallopian tube in 89%, cornus in 7.2%, ovary in 1.1% and the cervix in 2.7%. Laparosopic surgeries were performed in 755 cases (71.6%) and laparotomies in 273 cases (25.9%) and dilatation and curettages in 26 cases (2.5%). Salpingectomy was performed most frequently (82.4%). Methotrexate (MTX) treatment was successful in 13 cases (1.21%). CONCLUSION: The early diagnosis of ectopic pregnancy is most useful when serum beta-hCG and vaginal sonography are used together. Laparoscopy would be a preferred method because of its short hospitalization period and low complication rate compared with laparotomy in ectopic pregnancy treatment.
Abdominal Pain
;
Amenorrhea
;
Cervix Uteri
;
Cornus
;
Curettage
;
Dilatation
;
Early Diagnosis
;
Fallopian Tubes
;
Female
;
Hemoglobins
;
Hospitalization
;
Humans
;
Incidence
;
Laparoscopy
;
Laparotomy
;
Medical Records
;
Methotrexate
;
Metrorrhagia
;
Ovary
;
Pelvic Inflammatory Disease
;
Pregnancy
;
Pregnancy, Ectopic
;
Risk Factors
;
Salpingectomy
;
Sterilization, Tubal
9.Tissue-specific expression and subcellular localization of ALADIN, the absence of which causes human triple A syndrome.
A Ri CHO ; Keum Jin YANG ; Yoonsun BAE ; Young Yil BAHK ; Eunmin KIM ; Hyungnam LEE ; Jeong Ki KIM ; Wonsang PARK ; Hyanshuk RHIM ; Soo Young CHOI ; Tsuneo IMANAKA ; Sungdae MOON ; Jongbok YOON ; Sungjoo Kim YOON
Experimental & Molecular Medicine 2009;41(6):381-386
Triple A syndrome is a rare genetic disorder caused by mutations in the achalasia-addisonianism-alacrima syndrome (AAAS) gene which encodes a tryptophan aspartic acid (WD) repeat-containing protein named alacrima-achalasia-adrenal insufficiency neurologic disorder (ALADIN). Northern blot analysis shows that the 2.1 kb AAAS mRNA is expressed in various tissues with stronger expression in testis and pancreas. We show that human ALADIN is a protein with an apparent molecular weight of 60 kDa, and expressed in the adrenal gland, pituitary gland and pancreas. Furthermore, biochemical analysis using anti-ALADIN antibody supports the previous finding of the localization of ALADIN in the nuclear membrane. The mutations S544G and S544X show that alteration of S544 residue affects correct targeting of ALADIN to the nuclear membrane.
Adrenal Insufficiency/*genetics
;
Antibodies/immunology
;
Cloning, Molecular
;
DNA, Complementary/genetics
;
Esophageal Achalasia/*genetics
;
Gene Expression Profiling
;
Hela Cells
;
Humans
;
Lacrimal Apparatus Diseases/*genetics
;
Mutagenesis, Site-Directed
;
Nerve Tissue Proteins/*analysis/*genetics/immunology
;
Nuclear Pore/chemistry
;
Nuclear Pore Complex Proteins/*analysis/*genetics/immunology
;
RNA, Messenger/analysis/genetics
;
Syndrome
;
Tissue Distribution
10.The Effect of Cigarette Price on Smoking Behavior in Korea.
Woojin CHUNG ; Seungji LIM ; Sunmi LEE ; Sungjoo CHOI ; Kayoung SHIN ; Kyungsook CHO
Journal of Preventive Medicine and Public Health 2007;40(5):371-380
OBJECTIVES: To determine the impact of cigarette prices on the decision to initiate and quit smoking by taking into account the interdependence of smoking and other behavioral risk factors. METHODS: The study population consisted of 3,000 male Koreans aged > or =20. A survey by telephone interview was undertaken to collect information on cigarette price, smoking and other behavioral risk factors. A two-part model was used to examine separately the effect of price on the decision to be a smoker, and on the amount of cigarettes smoked. RESULTS: The overall price elasticity of cigarettes was estimated at -0.66, with a price elasticity of -0.02 for smoking participation and -0.64 for the amount of cigarettes consumed by smokers. The inclusion of other behavioral risk factors reduced the estimated price elasticity for smoking participation substantially, but had no effect on the conditional price elasticity for the quantity of cigarettes smoked. CONCLUSIONS: From the public health and financial perspectives, an increase in cigarette price would significantly reduce smoking prevalence as well as cigarette consumption by smokers in Korea.
Adult
;
*Costs and Cost Analysis
;
Health Behavior
;
Humans
;
Korea/epidemiology
;
Male
;
Middle Aged
;
Risk Factors
;
Smoking/*economics/*prevention & control
;
Social Environment
;
Socioeconomic Factors
;
*Tobacco

Result Analysis
Print
Save
E-mail