1.Performance of a prompt engineering method for extracting individual risk factors of precocious puberty from electronic medical records.
Feixiang ZHOU ; Taowei ZHONG ; Guiyan YANG ; Xianglong DING ; Yan YAN
Journal of Central South University(Medical Sciences) 2025;50(7):1224-1233
OBJECTIVES:
Accurate identification of risk factors for precocious puberty is essential for clinical diagnosis and management, yet the performance of natural language processing methods applied to unstructured electronic medical record (EMR) data remains to be fully evaluated. This study aims to assess the performance of a prompt engineering method for extracting individual risk factors of precocious puberty from EMRs.
METHODS:
Based on the capacity and role-insight-statement-personality-experiment (CRISPE) prompt framework, both simple and optimized prompts were designed to guide the large language model GLM-4-9B in extracting 10 types of risk factors for precocious puberty from 653 EMRs. Accuracy, precision, recall, and F1-score were used as evaluation metrics for the information extraction task.
RESULTS:
Under simple and optimized prompt conditions, the overall accuracy, precision, recall, and F1-score of the model were 84.18%, 98.09%, 81.99%, and 89.32% versus 97.15%, 98.31%, 98.16%, and 98.23%, respectively. The optimized prompts achieved more stable performance across age (<9 years vs ≥9 years) and visit-time (<2023 vs ≥2023) subgroups compared with simple prompts. The accuracy range for extracting each risk factor was 60.03%-97.24%, while with optimized prompts, the range improved to 92.19%-99.85%. The largest performance improvement occurred for "beverage intake" (60.03% vs 92.19%), and the smallest for "maternal age of menarche" (97.24% vs 99.23%). In comparing distributions among simple prompts, optimized prompts, and ground truth, statistically significant differences were observed for snack intake, beverage intake, soy milk intake, honey intake, supplement use, tonic use, sleep quality, and sleeping with the light on (all P<0.001), while exercise (P=0.966) and maternal menarche age (P=0.952) showed no significant differences.
CONCLUSIONS
Compared with simple prompts, optimized prompts substantially improved the extraction performance of individual risk factors for precocious puberty from EMRs, underscoring the critical role of prompt engineering in enhancing large language model performance.
Humans
;
Puberty, Precocious/epidemiology*
;
Risk Factors
;
Electronic Health Records
;
Female
;
Child
;
Natural Language Processing
2.Association of BHMT and BHMT2 gene polymorphisms with non-syndromic congenital heart disease: a case-control study
Jiapeng TANG ; Jun OU ; Yige CHEN ; Mengting SUN ; Manjun LUO ; Qian CHEN ; Taowei ZHONG ; Jianhui WEI ; Tingting WANG ; Jiabi QIN
Chinese Journal of Preventive Medicine 2024;58(4):497-507
Objective:To explore the association of human betaine-homocysteine methyltransferase ( BHMT) and BHMT2 gene polymorphisms with non-syndromic congenital heart disease (CHD). Methods:A hospital-based case-control study was conducted, in which children with CHD who attended Hunan Children′s Hospital from January 2018 to May 2019 were enrolled as the case group, and children without any congenital deformity who attended the hospital during the same period were enrolled as the control group on a 1∶1 basis. A self-administered questionnaire survey was performed to collect information about the study subjects and their mothers, and then venous blood samples were collected from the subjects to detect BHMT and BHMT2 gene polymorphisms. Logistic regression analyses were used to evaluate the association of BHMT and BHMT2 gene polymorphisms and their haplotypes with CHD. Crossover analyses and logistic regression were used to explore the gene-gene and gene-environment interactions. Results:The case and control group both enrolled 620 children. The multivariate logistic regression showed that BHMT gene polymorphisms at rs3733890 (AA vs. GG: OR=3.476, Q FDR<0.001; GA vs. GG: OR=1.525, Q FDR=0.036), at rs1915706 (CC vs. TT: OR=3.464, Q FDR<0.001) and at rs1316753 (GG vs. CC: OR=1.875, Q FDR=0.020) increased the risk of CHD. Children with haplotype of A-G-A had an increased risk of CHD ( OR=1.468, 95% CI: 1.222-1.762). Interaction analysis showed that a statistically significant positive interaction between rs3733890 and rs1915706 on both additive ( RERI=0.628, 95% CI: 0.298-0.958) and multiplicative ( OR=3.754, 95% CI: 1.875-7.519) scales. Gene-environment interactions were found between the BHMT gene with secondhand smoke exposure before pregnancy and in early pregnancy, tea consumption before pregnancy and in early pregnancy, alcohol consumption before pregnancy, and folic acid supplementation before or during pregnancy. Conclusion:BHMT gene rs3733890, rs1915706 and rs1316753 polymorphisms may be associated with the risk of CHD. In addition, there is an association of cooperative interaction between rs3733890 and rs1915706 on both additive and multiplicative scales with the risk of CHD, and the BHMT gene interacts with multiple environmental factors.
3.Association of BHMT and BHMT2 gene polymorphisms with non-syndromic congenital heart disease: a case-control study
Jiapeng TANG ; Jun OU ; Yige CHEN ; Mengting SUN ; Manjun LUO ; Qian CHEN ; Taowei ZHONG ; Jianhui WEI ; Tingting WANG ; Jiabi QIN
Chinese Journal of Preventive Medicine 2024;58(4):497-507
Objective:To explore the association of human betaine-homocysteine methyltransferase ( BHMT) and BHMT2 gene polymorphisms with non-syndromic congenital heart disease (CHD). Methods:A hospital-based case-control study was conducted, in which children with CHD who attended Hunan Children′s Hospital from January 2018 to May 2019 were enrolled as the case group, and children without any congenital deformity who attended the hospital during the same period were enrolled as the control group on a 1∶1 basis. A self-administered questionnaire survey was performed to collect information about the study subjects and their mothers, and then venous blood samples were collected from the subjects to detect BHMT and BHMT2 gene polymorphisms. Logistic regression analyses were used to evaluate the association of BHMT and BHMT2 gene polymorphisms and their haplotypes with CHD. Crossover analyses and logistic regression were used to explore the gene-gene and gene-environment interactions. Results:The case and control group both enrolled 620 children. The multivariate logistic regression showed that BHMT gene polymorphisms at rs3733890 (AA vs. GG: OR=3.476, Q FDR<0.001; GA vs. GG: OR=1.525, Q FDR=0.036), at rs1915706 (CC vs. TT: OR=3.464, Q FDR<0.001) and at rs1316753 (GG vs. CC: OR=1.875, Q FDR=0.020) increased the risk of CHD. Children with haplotype of A-G-A had an increased risk of CHD ( OR=1.468, 95% CI: 1.222-1.762). Interaction analysis showed that a statistically significant positive interaction between rs3733890 and rs1915706 on both additive ( RERI=0.628, 95% CI: 0.298-0.958) and multiplicative ( OR=3.754, 95% CI: 1.875-7.519) scales. Gene-environment interactions were found between the BHMT gene with secondhand smoke exposure before pregnancy and in early pregnancy, tea consumption before pregnancy and in early pregnancy, alcohol consumption before pregnancy, and folic acid supplementation before or during pregnancy. Conclusion:BHMT gene rs3733890, rs1915706 and rs1316753 polymorphisms may be associated with the risk of CHD. In addition, there is an association of cooperative interaction between rs3733890 and rs1915706 on both additive and multiplicative scales with the risk of CHD, and the BHMT gene interacts with multiple environmental factors.

Result Analysis
Print
Save
E-mail