1.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.
2.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.
3.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.
4.Several issues regarding the diagnostic imaging of medication-related osteonecrosis of the jaw
Jo-Eun KIM ; Sumin YOO ; Soon-Chul CHOI
Imaging Science in Dentistry 2020;50(4):273-279
This review presents an overview of some diagnostic imaging-related issues regarding medication-related osteonecrosis of the jaws (MRONJ), including imaging signs that can predict MRONJ in patients taking antiresorptive drugs, the early imaging features of MRONJ, the relationship between the presence or absence of bone exposure and imaging features, and differences in imaging features by stage, between advanced MRONJ and conventional osteomyelitis, between oncologic and osteoporotic patients with MRONJ, and depending on the type of medication, method of administration, and duration of medication. The early diagnosis of MRONJ can be made by the presence of subtle imaging changes such as thickening of the lamina dura or cortical bone, not by the presence of bone exposure. Most of the imaging features are relatively non-specific, and each patient’s clinical findings and history should be referenced. Oral and maxillofacial radiologists and dentists should closely monitor plain radiographs of patients taking antiresorptive/antiangiogenic drugs.
5.Dietary effect of green tea extract on hydration improvement and metabolism of free amino acid generation in epidermis of UV-irradiated hairless mice.
Sumin CHOI ; Jihye SHIN ; Bomin LEE ; Yunhi CHO
Journal of Nutrition and Health 2016;49(5):269-276
PURPOSE: Ultraviolet (UV) irradiation decreases epidermal hydration, which is maintained by reduction of natural moisturizing factors (NMFs). Among various NMFs, free amino acids (AA) are major constituents generated by filaggrin degradation. This experiment was conducted to determine whether or not dietary supplementation of green tea extract (GTE) in UV-irradiated mice can improve epidermal levels of hydration, filaggrin, free AAs, and peptidylarginine deiminase-3 (PAD3) expression (an enzyme involved in filaggrin degradation). METHODS: Hairless mice were fed a diet of 1% GTE for 10 weeks in parallel with UV irradiation (group UV+1%GTE). As controls, hairless mice were fed a control diet in parallel with (group UV+) or without (group UV-) UV irradiation. RESULTS: In group UV+, epidermal levels of hydration and filaggrin were lower than those in group UV-; these levels increased in group UV+1% GTE to levels similar to group UV-. Epidermal levels of PAD3 and major AAs of NMF, alanine, glycine and serine were similar in groups UV- and UV+, whereas these levels highly increased in group UV+1% GTE. CONCLUSION: Dietary GTE improves epidermal hydration by filaggrin generation and degradation into AAs.
Alanine
;
Amino Acids
;
Animals
;
Diet
;
Dietary Supplements
;
Epidermis*
;
Glycine
;
Metabolism*
;
Mice
;
Mice, Hairless*
;
Serine
;
Tea*
6.Two Sjogren syndrome-associated oral bacteria, Prevotella melaninogenica and Rothia mucilaginosa, induce the upregulation of major histocompatibility complex class I and hypoxia-associated cell death, respectively, in human salivary gland cells
Jaewon LEE ; Sumin JEON ; Youngnim CHOI
International Journal of Oral Biology 2021;46(4):190-199
Despite evidence that bacteria-sensing Toll-like receptors (TLRs) are activated in salivary gland tissues of Sjogren syndrome (SS) patients, the role of oral bacteria in SS etiopathogenesis is unclear. We previously reported that two SS-associated oral bacteria, Prevotella melaninogenica (Pm) and Rothia mucilagenosa (Rm), oppositely regulate the expression of major histocompatibility complex class I (MHC I) in human salivary gland (HSG) cells. Here, we elucidated the mechanisms underlying the differential regulation of MHC I expression by these bacteria. The ability of Pm and Rm to activate TLR2, TLR4, and TLR9 was examined using TLR reporter cells. HSG cells were stimulated by the TLR ligands, Pm, and Rm. The levels of MHC I expression, bacterial invasion, and viability of HSG cells were examined by flow cytometry. The hypoxic status of HSG cells was examined using Hypoxia Green. HSG cells upregulated MHC I expression in response to TLR2, TLR4, and TLR9 activation. Both Pm and Rm activated TLR2 and TLR9 but not TLR4. Rm-induced downregulation of MHC I strongly correlated with bacterial invasion and cell death. Rm-induced cell death was not rescued by inhibitors of the diverse cell death pathways but was associated with hypoxia. In conclusion, Pm upregulated MHC I likely through TLR2 and TLR9 activation, while Rm-induced hypoxia-associated cell death and the downregulation of MHC I, despite its ability to activate TLR2 and TLR9. These findings may provide new insight into how oral dysbiosis can contribute to salivary gland tissue damage in SS.
7.Evaluation of ImmunoproteasomeSpecific Proteolytic Activity Using Fluorogenic Peptide Substrates
Sumin KIM ; Seo Hyeong PARK ; Won Hoon CHOI ; Min Jae LEE
Immune Network 2022;22(3):e28-
The 26S proteasome irreversibly hydrolyzes polyubiquitylated substrates to maintain protein homeostasis; it also regulates immune responses by generating antigenic peptides. An alternative form of the 26S proteasome is the immunoproteasome, which contains substituted catalytic subunits (β1i/PSMB9, β2i/PSMB10, and β5i/PSMB8) instead of constitutively expressed counterparts (β1/PSMB6, β2/PSMB7, and β5/PSMB5). The immunoproteasome expands the peptide repertoire presented on MHC class I molecules. However, how its activity changes in this context is largely elusive, possibly due to the lack of a standardized methodology to evaluate its specific activity. Here, we describe an assay protocol that measures the immunoproteasome activity of whole-cell lysates using commercially available fluorogenic peptide substrates. Our results showed that the most accurate assessment of immunoproteasome activity could be achieved by combining β5itargeting substrate Ac-ANW-AMC and immunoproteasome inhibitor ONX-0914. This simple and reliable protocol may contribute to future studies of immunoproteasomes and their pathophysiological roles during viral infection, inflammation, and tumorigenesis.
8.Umami taste receptor suppresses cancer cachexia by regulating skeletal muscle atrophy in vivo and in vitro
Sumin LEE ; Yoonha CHOI ; Yerin KIM ; Yeon Kyung CHA ; Tai Hyun PARK ; Yuri KIM
Nutrition Research and Practice 2024;18(4):451-463
BACKGROUND/OBJECTIVES:
The umami taste receptor (TAS1R1/TAS1R3) is endogenously expressed in skeletal muscle and is involved in myogenesis; however, there is a lack of evidence about whether the expression of the umami taste receptor is involved in muscular diseases. This study aimed to elucidate the effects of the umami taste receptor and its mechanism on muscle wasting in cancer cachexia using in vivo and in vitro models.MATERIALS/METHODS: The Lewis lung carcinoma-induced cancer cachexia model was used in vivo and in vitro, and the expressions of umami taste receptor and muscle atrophy-related markers, muscle atrophy F-box protein, and muscle RING-finger protein-1 were analyzed.
RESULTS:
Results showed that TAS1R1 was significantly downregulated in vivo and in vitro under the muscle wasting condition. Moreover, overexpression of TAS1R1 in vitro in the human primary cell model protected the cells from muscle atrophy, and knockdown of TAS1R1 using siRNA exacerbated muscle atrophy.
CONCLUSION
Taken together, the umami taste receptor exerts protective effects on muscle-wasting conditions by restoring dysregulated muscle atrophy in cancer cachexia. In conclusion, this result provided evidence that the umami taste receptor exerts a therapeutic anti-cancer cachexia effect by restoring muscle atrophy.
9.Role of fractionated radiotherapy in patients with hemangioma of the cavernous sinus.
Sunmin PARK ; Sang Min YOON ; Sumin LEE ; Jin hong PARK ; Si Yeol SONG ; Sang wook LEE ; Seung Do AHN ; Jong Hoon KIM ; Eun Kyung CHOI
Radiation Oncology Journal 2017;35(3):268-273
PURPOSE: We performed this retrospective study to investigate the outcomes of patients with hemangioma of the cavernous sinus after fractionated radiotherapy. MATERIALS AND METHODS: We analyzed 10 patients with hemangioma of the cavernous sinus who were treated with conventional radiotherapy between January 2000 and December 2016. The median patient age was 54 years (range, 31–65 years), and 8 patients (80.0%) were female. The mean hemangioma volume was 34.1 cm3 (range, 6.8–83.2 cm3), and fractionated radiation was administered to a total dose of 50–54 Gy with a daily dose of 2 Gy. RESULTS: The median follow-up period was 6.8 years (range, 2.2–8.8 years). At last follow-up, the volume of the tumor had decreased in all patients. The average tumor volume reduction rate from the initial volume was 72.9% (range, 18.9–95.3%). All 10 of the cranial neuropathies observed before radiation therapy had improved, with complete symptomatic remission in 9 cases (90%) and partial remission in 1 case (10%). No new acute neurologic impairments were reported after radiotherapy. One probable compressive optic neuropathy was observed at 1 year after radiotherapy. CONCLUSION: Fractionated radiotherapy achieves both symptomatic and radiologic improvements. It is a well-tolerated treatment modality for hemangiomas of the cavernous sinus.
Cavernous Sinus*
;
Cranial Nerve Diseases
;
Female
;
Follow-Up Studies
;
Hemangioma*
;
Humans
;
Optic Nerve Diseases
;
Radiotherapy*
;
Retrospective Studies
;
Tumor Burden
10.Role of fractionated radiotherapy in patients with hemangioma of the cavernous sinus.
Sunmin PARK ; Sang Min YOON ; Sumin LEE ; Jin hong PARK ; Si Yeol SONG ; Sang wook LEE ; Seung Do AHN ; Jong Hoon KIM ; Eun Kyung CHOI
Radiation Oncology Journal 2017;35(3):268-273
PURPOSE: We performed this retrospective study to investigate the outcomes of patients with hemangioma of the cavernous sinus after fractionated radiotherapy. MATERIALS AND METHODS: We analyzed 10 patients with hemangioma of the cavernous sinus who were treated with conventional radiotherapy between January 2000 and December 2016. The median patient age was 54 years (range, 31–65 years), and 8 patients (80.0%) were female. The mean hemangioma volume was 34.1 cm3 (range, 6.8–83.2 cm3), and fractionated radiation was administered to a total dose of 50–54 Gy with a daily dose of 2 Gy. RESULTS: The median follow-up period was 6.8 years (range, 2.2–8.8 years). At last follow-up, the volume of the tumor had decreased in all patients. The average tumor volume reduction rate from the initial volume was 72.9% (range, 18.9–95.3%). All 10 of the cranial neuropathies observed before radiation therapy had improved, with complete symptomatic remission in 9 cases (90%) and partial remission in 1 case (10%). No new acute neurologic impairments were reported after radiotherapy. One probable compressive optic neuropathy was observed at 1 year after radiotherapy. CONCLUSION: Fractionated radiotherapy achieves both symptomatic and radiologic improvements. It is a well-tolerated treatment modality for hemangiomas of the cavernous sinus.
Cavernous Sinus*
;
Cranial Nerve Diseases
;
Female
;
Follow-Up Studies
;
Hemangioma*
;
Humans
;
Optic Nerve Diseases
;
Radiotherapy*
;
Retrospective Studies
;
Tumor Burden