1.LLM-Based Response Generation for Korean Adolescents: A Study Using the NAVER Knowledge iN Q&A Dataset with RAG
Junseo KIM ; Seok Jun KIM ; Junseok AHN ; Suehyun LEE
Healthcare Informatics Research 2025;31(2):136-145
Objectives:
This research aimed to develop a retrieval-augmented generation (RAG) based large language model (LLM) system that offers personalized and reliable responses to a wide range of concerns raised by Korean adolescents. Our work focuses on building a culturally reflective dataset and on designing and validating the system’s effectiveness by comparing the answer quality of RAG-based models with non-RAG models.
Methods:
Data were collected from the NAVER Knowledge iN platform, concentrating on posts that featured adolescents’ questions and corresponding expert responses during the period 2014–2024. The dataset comprises 3,874 cases, categorized by key negative emotions and the primary sources of worry. The data were processed to remove irrelevant or redundant content and then classified into general and detailed causes. The RAG-based model employed FAISS for similarity-based retrieval of the top three reference cases and used GPT-4o mini for response generation. The responses generated with and without RAG were evaluated using several metrics.
Results:
RAG-based responses outperformed non-RAG responses across all evaluation metrics. Key findings indicate that RAG-based responses delivered more specific, empathetic, and actionable guidance, particularly when addressing complex emotional and situational concerns. The analysis revealed that family relationships, peer interactions, and academic stress are significant factors affecting adolescents’ worries, with depression and stress frequently co-occurring.
Conclusions
This study demonstrates the potential of RAG-based LLMs to address the diverse and culture-specific worries of Korean adolescents. By integrating external knowledge and offering personalized support, the proposed system provides a scalable approach to enhancing mental health interventions for adolescents. Future research should concentrate on expanding the dataset and improving multiturn conversational capabilities to deliver even more comprehensive support.
2.LLM-Based Response Generation for Korean Adolescents: A Study Using the NAVER Knowledge iN Q&A Dataset with RAG
Junseo KIM ; Seok Jun KIM ; Junseok AHN ; Suehyun LEE
Healthcare Informatics Research 2025;31(2):136-145
Objectives:
This research aimed to develop a retrieval-augmented generation (RAG) based large language model (LLM) system that offers personalized and reliable responses to a wide range of concerns raised by Korean adolescents. Our work focuses on building a culturally reflective dataset and on designing and validating the system’s effectiveness by comparing the answer quality of RAG-based models with non-RAG models.
Methods:
Data were collected from the NAVER Knowledge iN platform, concentrating on posts that featured adolescents’ questions and corresponding expert responses during the period 2014–2024. The dataset comprises 3,874 cases, categorized by key negative emotions and the primary sources of worry. The data were processed to remove irrelevant or redundant content and then classified into general and detailed causes. The RAG-based model employed FAISS for similarity-based retrieval of the top three reference cases and used GPT-4o mini for response generation. The responses generated with and without RAG were evaluated using several metrics.
Results:
RAG-based responses outperformed non-RAG responses across all evaluation metrics. Key findings indicate that RAG-based responses delivered more specific, empathetic, and actionable guidance, particularly when addressing complex emotional and situational concerns. The analysis revealed that family relationships, peer interactions, and academic stress are significant factors affecting adolescents’ worries, with depression and stress frequently co-occurring.
Conclusions
This study demonstrates the potential of RAG-based LLMs to address the diverse and culture-specific worries of Korean adolescents. By integrating external knowledge and offering personalized support, the proposed system provides a scalable approach to enhancing mental health interventions for adolescents. Future research should concentrate on expanding the dataset and improving multiturn conversational capabilities to deliver even more comprehensive support.
3.LLM-Based Response Generation for Korean Adolescents: A Study Using the NAVER Knowledge iN Q&A Dataset with RAG
Junseo KIM ; Seok Jun KIM ; Junseok AHN ; Suehyun LEE
Healthcare Informatics Research 2025;31(2):136-145
Objectives:
This research aimed to develop a retrieval-augmented generation (RAG) based large language model (LLM) system that offers personalized and reliable responses to a wide range of concerns raised by Korean adolescents. Our work focuses on building a culturally reflective dataset and on designing and validating the system’s effectiveness by comparing the answer quality of RAG-based models with non-RAG models.
Methods:
Data were collected from the NAVER Knowledge iN platform, concentrating on posts that featured adolescents’ questions and corresponding expert responses during the period 2014–2024. The dataset comprises 3,874 cases, categorized by key negative emotions and the primary sources of worry. The data were processed to remove irrelevant or redundant content and then classified into general and detailed causes. The RAG-based model employed FAISS for similarity-based retrieval of the top three reference cases and used GPT-4o mini for response generation. The responses generated with and without RAG were evaluated using several metrics.
Results:
RAG-based responses outperformed non-RAG responses across all evaluation metrics. Key findings indicate that RAG-based responses delivered more specific, empathetic, and actionable guidance, particularly when addressing complex emotional and situational concerns. The analysis revealed that family relationships, peer interactions, and academic stress are significant factors affecting adolescents’ worries, with depression and stress frequently co-occurring.
Conclusions
This study demonstrates the potential of RAG-based LLMs to address the diverse and culture-specific worries of Korean adolescents. By integrating external knowledge and offering personalized support, the proposed system provides a scalable approach to enhancing mental health interventions for adolescents. Future research should concentrate on expanding the dataset and improving multiturn conversational capabilities to deliver even more comprehensive support.
4.Corrigendum to: Development and Verification of Time-Series Deep Learning for Drug-Induced Liver Injury Detection in Patients Taking Angiotensin II Receptor Blockers: A Multicenter Distributed Research Network Approach
Suncheol HEO ; Jae Yong YU ; Eun Ae KANG ; Hyunah SHIN ; Kyeongmin RYU ; Chungsoo KIM ; Yebin CHEGA ; Hyojung JUNG ; Suehyun LEE ; Rae Woong PARK ; Kwangsoo KIM ; Yul HWANGBO ; Jae-Hyun LEE ; Yu Rang PARK
Healthcare Informatics Research 2024;30(2):168-168
6.Polypharmacy and Elevated Risk of Severe Adverse Events in Older Adults Based on the Korea Institute of Drug Safety and Risk Management-Korea Adverse Event Reporting System Database
Grace Juyun KIM ; Ji Sung LEE ; Sujung JANG ; Seonghui LEE ; Seongwoo JEON ; Suehyun LEE ; Ju Han KIM ; Kye Hwa LEE
Journal of Korean Medical Science 2024;39(28):e205-
Background:
Older adults are at a higher risk of severe adverse drug events (ADEs) because of multimorbidity, polypharmacy, and lower physiological function. This study aimed to determine whether polypharmacy, defined as the use of ≥ 5 active drug ingredients, was associated with severe ADEs in this population.
Methods:
We used ADE reports from the Korea Institute of Drug Safety and Risk Management-Korea Adverse Event Reporting System Database, a national spontaneous ADE report system, from 2012 to 2021 to examine and compare the strength of association between polypharmacy and severe ADEs in older adults (≥ 65 years) and younger adults (20–64 years) using disproportionality analysis.
Results:
We found a significant association between severe ADEs of cardiac and renal/ urinary Medical Dictionary for Regulatory Activities System Organ Classes (MedDRA SOC) with polypharmacy in older adults. Regarding individual-level ADEs included in these MedDRA SOCs, acute cardiac arrest and renal failure were more significantly associated with polypharmacy in older adults compared with younger adults.
Conclusion
The addition of new drugs to the regimens of older adults warrants close monitoring of renal and cardiac symptoms.
7.Development and Verification of Time-Series Deep Learning for Drug-Induced Liver Injury Detection in Patients Taking Angiotensin II Receptor Blockers: A Multicenter Distributed Research Network Approach
Suncheol HEO ; Jae Yong YU ; Eun Ae KANG ; Hyunah SHIN ; Kyeongmin RYU ; Chungsoo KIM ; Yebin CHEGAL ; Hyojung JUNG ; Suehyun LEE ; Rae Woong PARK ; Kwangsoo KIM ; Yul HWANGBO ; Jae-Hyun LEE ; Yu Rang PARK
Healthcare Informatics Research 2023;29(3):246-255
Objectives:
The objective of this study was to develop and validate a multicenter-based, multi-model, time-series deep learning model for predicting drug-induced liver injury (DILI) in patients taking angiotensin receptor blockers (ARBs). The study leveraged a national-level multicenter approach, utilizing electronic health records (EHRs) from six hospitals in Korea.
Methods:
A retrospective cohort analysis was conducted using EHRs from six hospitals in Korea, comprising a total of 10,852 patients whose data were converted to the Common Data Model. The study assessed the incidence rate of DILI among patients taking ARBs and compared it to a control group. Temporal patterns of important variables were analyzed using an interpretable timeseries model.
Results:
The overall incidence rate of DILI among patients taking ARBs was found to be 1.09%. The incidence rates varied for each specific ARB drug and institution, with valsartan having the highest rate (1.24%) and olmesartan having the lowest rate (0.83%). The DILI prediction models showed varying performance, measured by the average area under the receiver operating characteristic curve, with telmisartan (0.93), losartan (0.92), and irbesartan (0.90) exhibiting higher classification performance. The aggregated attention scores from the models highlighted the importance of variables such as hematocrit, albumin, prothrombin time, and lymphocytes in predicting DILI.
Conclusions
Implementing a multicenter-based timeseries classification model provided evidence that could be valuable to clinicians regarding temporal patterns associated with DILI in ARB users. This information supports informed decisions regarding appropriate drug use and treatment strategies.
8.A Study on Methodologies of Drug Repositioning Using Biomedical Big Data: A Focus on Diabetes Mellitus
Suehyun LEE ; Seongwoo JEON ; Hun-Sung KIM
Endocrinology and Metabolism 2022;37(2):195-207
Drug repositioning is a strategy for identifying new applications of an existing drug that has been previously proven to be safe. Based on several examples of drug repositioning, we aimed to determine the methodologies and relevant steps associated with drug repositioning that should be pursued in the future. Reports on drug repositioning, retrieved from PubMed from January 2011 to December 2020, were classified based on an analysis of the methodology and reviewed by experts. Among various drug repositioning methods, the network-based approach was the most common (38.0%, 186/490 cases), followed by machine learning/deep learningbased (34.3%, 168/490 cases), text mining-based (7.1%, 35/490 cases), semantic-based (5.3%, 26/490 cases), and others (15.3%, 75/490 cases). Although drug repositioning offers several advantages, its implementation is curtailed by the need for prior, conclusive clinical proof. This approach requires the construction of various databases, and a deep understanding of the process underlying repositioning is quintessential. An in-depth understanding of drug repositioning could reduce the time, cost, and risks inherent to early drug development, providing reliable scientific evidence. Furthermore, regarding patient safety, drug repurposing might allow the discovery of new relationships between drugs and diseases.
9.Methodological Round:Prospect of Artificial Intelligence Based on Electronic Medical Record
Journal of Lipid and Atherosclerosis 2021;10(3):282-290
With the advent of the big data era, the interest of the international community is focusing on increasing the utilization of medical big data. Many hospitals are attempting to increase the efficiency of their operations and patient management by adopting artificial intelligence (AI) technology that enables the use of electronic medical record (EMR) data. EMR includes information about a patient's health history, such as diagnoses, medicines, tests, allergies, immunizations, treatment plans, personalized medical care, and improvement of medical quality and safety. EMR data can also be used for AI-based new drug development. In particular, it is effective to develop AI that can predict the occurrence of specific diseases or provide individualized customized treatments by classifying the individualized characteristics of patients. In order to improve performance of artificial intelligence research using EMR data, standardization and refinement of data are essential. In addition, since EMR data deal with sensitive personal information of patients, it is also vital to protect the patient's privacy.There are already various supports for the use of EMR data in the Korean government, and researchers are encouraged to be proactive.
10.Real-world Evidence versus Randomized Controlled Trial: Clinical Research Based on Electronic Medical Records.
Hun Sung KIM ; Suehyun LEE ; Ju Han KIM
Journal of Korean Medical Science 2018;33(34):e213-
Real-world evidence (RWE) and randomized control trial (RCT) data are considered mutually complementary. However, compared with RCT, the outcomes of RWE continue to be assigned lower credibility. It must be emphasized that RWE research is a real-world practice that does not need to be executed as RCT research for it to be reliable. The advantages and disadvantages of RWE must be discerned clearly, and then the proper protocol can be planned from the beginning of the research to secure as many samples as possible. Attention must be paid to privacy protection. Moreover, bias can be reduced meaningfully by reducing the number of dropouts through detailed and meticulous data quality management. RCT research, characterized as having the highest reliability, and RWE research, which reflects the actual clinical aspects, can have a mutually supplementary relationship. Indeed, once this is proven, the two could comprise the most powerful evidence-based research method in medicine.
Bias (Epidemiology)
;
Data Accuracy
;
Electronic Health Records*
;
Methods
;
Privacy

Result Analysis
Print
Save
E-mail