1.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.
2.Analysis of Adverse Drug Reactions Identified in Nursing Notes Using Reinforcement Learning
Eunjoo JEON ; Youngsam KIM ; Hojun PARK ; Rae Woong PARK ; Hyopil SHIN ; Hyeoun-Ae PARK
Healthcare Informatics Research 2020;26(2):104-111
Electronic Health Records (EHRs)-based surveillance systems are being actively developed for detecting adverse drug reactions (ADRs), but this is being hindered by the difficulty of extracting data from unstructured records. This study performed the analysis of ADRs from nursing notes for drug safety surveillance using the temporal difference method in reinforcement learning (TD learning). Nursing notes of 8,316 patients (4,158 ADR and 4,158 non-ADR cases) admitted to Ajou University Hospital were used for the ADR classification task. A TD(λ) model was used to estimate state values for indicating the ADR risk. For the TD learning, each nursing phrase was encoded into one of seven states, and the state values estimated during training were employed for the subsequent testing phase. We applied logistic regression to the state values from the TD(λ) model for the classification task. The overall accuracy of TD-based logistic regression of 0.63 was comparable to that of two machine-learning methods (0.64 for a naïve Bayes classifier and 0.63 for a support vector machine), while it outperformed two deep learning-based methods (0.58 for a text convolutional neural network and 0.61 for a long short-term memory neural network). Most importantly, it was found that the TD-based method can estimate state values according to the context of nursing phrases. TD learning is a promising approach because it can exploit contextual, time-dependent aspects of the available data and provide an analysis of the severity of ADRs in a fully incremental manner.
3.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.
4.Advancing Korean Medical Large Language Models: Automated Pipeline for Korean Medical Preference Dataset Construction
Jean SEO ; Sumin PARK ; Sungjoo BYUN ; Jinwook CHOI ; Jinho CHOI ; Hyopil SHIN
Healthcare Informatics Research 2025;31(2):166-174
Objectives:
Developing large language models (LLMs) in biomedicine requires access to high-quality training and alignment tuning datasets. However, publicly available Korean medical preference datasets are scarce, hindering the advancement of Korean medical LLMs. This study constructs and evaluates the efficacy of the Korean Medical Preference Dataset (KoMeP), an alignment tuning dataset constructed with an automated pipeline, minimizing the high costs of human annotation.
Methods:
KoMeP was generated using the DAHL score, an automated hallucination evaluation metric. Five LLMs (Dolly-v2-3B, MPT-7B, GPT-4o, Qwen-2-7B, Llama-3-8B) produced responses to 8,573 biomedical examination questions, from which 5,551 preference pairs were extracted. Each pair consisted of a “chosen” response and a “rejected” response, as determined by their DAHL scores. The dataset was evaluated when trained through two different alignment tuning methods, direct preference optimization (DPO) and odds ratio preference optimization (ORPO) respectively across five different models. The KorMedMCQA benchmark was employed to assess the effectiveness of alignment tuning.
Results:
Models trained with DPO consistently improved KorMedMCQA performance; notably, Llama-3.1-8B showed a 43.96% increase. In contrast, ORPO training produced inconsistent results. Additionally, English-to-Korean transfer learning proved effective, particularly for English-centric models like Gemma-2, whereas Korean-to-English transfer learning achieved limited success. Instruction tuning with KoMeP yielded mixed outcomes, which suggests challenges in dataset formatting.
Conclusions
KoMeP is the first publicly available Korean medical preference dataset and significantly improves alignment tuning performance in LLMs. The DPO method outperforms ORPO in alignment tuning. Future work should focus on expanding KoMeP, developing a Korean-native dataset, and refining alignment tuning methods to produce safer and more reliable Korean medical LLMs.