Performance of a prompt engineering method for extracting individual risk factors of precocious puberty from electronic medical records.
10.11817/j.issn.1672-7347.2025.240651
- Author:
Feixiang ZHOU
1
;
Taowei ZHONG
2
;
Guiyan YANG
2
;
Xianglong DING
2
;
Yan YAN
3
Author Information
1. Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha 410013, China. zhoufeixiang@csu.edu.cn.
2. Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha 410013, China.
3. Department of Epidemiology and Health Statistics, Xiangya School of Public Health, Central South University, Changsha 410013, China. yanyan@csu.edu.cn.
- Publication Type:Journal Article
- Keywords:
electronic medical record;
information extraction;
large language model;
precocious puberty;
prompt engineering
- MeSH:
Humans;
Puberty, Precocious/epidemiology*;
Risk Factors;
Electronic Health Records;
Female;
Child;
Natural Language Processing
- From:
Journal of Central South University(Medical Sciences)
2025;50(7):1224-1233
- CountryChina
- Language:Chinese
-
Abstract:
OBJECTIVES:Accurate identification of risk factors for precocious puberty is essential for clinical diagnosis and management, yet the performance of natural language processing methods applied to unstructured electronic medical record (EMR) data remains to be fully evaluated. This study aims to assess the performance of a prompt engineering method for extracting individual risk factors of precocious puberty from EMRs.
METHODS:Based on the capacity and role-insight-statement-personality-experiment (CRISPE) prompt framework, both simple and optimized prompts were designed to guide the large language model GLM-4-9B in extracting 10 types of risk factors for precocious puberty from 653 EMRs. Accuracy, precision, recall, and F1-score were used as evaluation metrics for the information extraction task.
RESULTS:Under simple and optimized prompt conditions, the overall accuracy, precision, recall, and F1-score of the model were 84.18%, 98.09%, 81.99%, and 89.32% versus 97.15%, 98.31%, 98.16%, and 98.23%, respectively. The optimized prompts achieved more stable performance across age (<9 years vs ≥9 years) and visit-time (<2023 vs ≥2023) subgroups compared with simple prompts. The accuracy range for extracting each risk factor was 60.03%-97.24%, while with optimized prompts, the range improved to 92.19%-99.85%. The largest performance improvement occurred for "beverage intake" (60.03% vs 92.19%), and the smallest for "maternal age of menarche" (97.24% vs 99.23%). In comparing distributions among simple prompts, optimized prompts, and ground truth, statistically significant differences were observed for snack intake, beverage intake, soy milk intake, honey intake, supplement use, tonic use, sleep quality, and sleeping with the light on (all P<0.001), while exercise (P=0.966) and maternal menarche age (P=0.952) showed no significant differences.
CONCLUSIONS:Compared with simple prompts, optimized prompts substantially improved the extraction performance of individual risk factors for precocious puberty from EMRs, underscoring the critical role of prompt engineering in enhancing large language model performance.