1.Adherence of Studies on Large Language Models for Medical Applications Published in Leading Medical Journals According to the MI-CLEAR-LLM Checklist
Ji Su KO ; Hwon HEO ; Chong Hyun SUH ; Jeho YI ; Woo Hyun SHIM
Korean Journal of Radiology 2025;26(4):304-312
Objective:
To evaluate the adherence of large language model (LLM)-based healthcare research to the Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM) checklist, a framework designed to enhance the transparency and reproducibility of studies on the accuracy of LLMs for medical applications.
Materials and Methods:
A systematic PubMed search was conducted to identify articles on LLM performance published in high-ranking clinical medicine journals (the top 10% in each of the 59 specialties according to the 2023 Journal Impact Factor) from November 30, 2022, through June 25, 2024. Data on the six MI-CLEAR-LLM checklist items: 1) identification and specification of the LLM used, 2) stochasticity handling, 3) prompt wording and syntax, 4) prompt structuring, 5) prompt testing and optimization, and 6) independence of the test data—were independently extracted by two reviewers, and adherence was calculated for each item.
Results:
Of 159 studies, 100% (159/159) reported the name of the LLM, 96.9% (154/159) reported the version, and 91.8% (146/159) reported the manufacturer. However, only 54.1% (86/159) reported the training data cutoff date, 6.3% (10/159) documented access to web-based information, and 50.9% (81/159) provided the date of the query attempts. Clear documentation regarding stochasticity management was provided in 15.1% (24/159) of the studies. Regarding prompt details, 49.1% (78/159) provided exact prompt wording and syntax but only 34.0% (54/159) documented prompt-structuring practices. While 46.5% (74/159) of the studies detailed prompt testing, only 15.7% (25/159) explained the rationale for specific word choices. Test data independence was reported for only 13.2% (21/159) of the studies, and 56.6% (43/76) provided URLs for internet-sourced test data.
Conclusion
Although basic LLM identification details were relatively well reported, other key aspects, including stochasticity, prompts, and test data, were frequently underreported. Enhancing adherence to the MI-CLEAR-LLM checklist will allow LLM research to achieve greater transparency and will foster more credible and reliable future studies.
2.Adherence of Studies on Large Language Models for Medical Applications Published in Leading Medical Journals According to the MI-CLEAR-LLM Checklist
Ji Su KO ; Hwon HEO ; Chong Hyun SUH ; Jeho YI ; Woo Hyun SHIM
Korean Journal of Radiology 2025;26(4):304-312
Objective:
To evaluate the adherence of large language model (LLM)-based healthcare research to the Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM) checklist, a framework designed to enhance the transparency and reproducibility of studies on the accuracy of LLMs for medical applications.
Materials and Methods:
A systematic PubMed search was conducted to identify articles on LLM performance published in high-ranking clinical medicine journals (the top 10% in each of the 59 specialties according to the 2023 Journal Impact Factor) from November 30, 2022, through June 25, 2024. Data on the six MI-CLEAR-LLM checklist items: 1) identification and specification of the LLM used, 2) stochasticity handling, 3) prompt wording and syntax, 4) prompt structuring, 5) prompt testing and optimization, and 6) independence of the test data—were independently extracted by two reviewers, and adherence was calculated for each item.
Results:
Of 159 studies, 100% (159/159) reported the name of the LLM, 96.9% (154/159) reported the version, and 91.8% (146/159) reported the manufacturer. However, only 54.1% (86/159) reported the training data cutoff date, 6.3% (10/159) documented access to web-based information, and 50.9% (81/159) provided the date of the query attempts. Clear documentation regarding stochasticity management was provided in 15.1% (24/159) of the studies. Regarding prompt details, 49.1% (78/159) provided exact prompt wording and syntax but only 34.0% (54/159) documented prompt-structuring practices. While 46.5% (74/159) of the studies detailed prompt testing, only 15.7% (25/159) explained the rationale for specific word choices. Test data independence was reported for only 13.2% (21/159) of the studies, and 56.6% (43/76) provided URLs for internet-sourced test data.
Conclusion
Although basic LLM identification details were relatively well reported, other key aspects, including stochasticity, prompts, and test data, were frequently underreported. Enhancing adherence to the MI-CLEAR-LLM checklist will allow LLM research to achieve greater transparency and will foster more credible and reliable future studies.
3.Adherence of Studies on Large Language Models for Medical Applications Published in Leading Medical Journals According to the MI-CLEAR-LLM Checklist
Ji Su KO ; Hwon HEO ; Chong Hyun SUH ; Jeho YI ; Woo Hyun SHIM
Korean Journal of Radiology 2025;26(4):304-312
Objective:
To evaluate the adherence of large language model (LLM)-based healthcare research to the Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM) checklist, a framework designed to enhance the transparency and reproducibility of studies on the accuracy of LLMs for medical applications.
Materials and Methods:
A systematic PubMed search was conducted to identify articles on LLM performance published in high-ranking clinical medicine journals (the top 10% in each of the 59 specialties according to the 2023 Journal Impact Factor) from November 30, 2022, through June 25, 2024. Data on the six MI-CLEAR-LLM checklist items: 1) identification and specification of the LLM used, 2) stochasticity handling, 3) prompt wording and syntax, 4) prompt structuring, 5) prompt testing and optimization, and 6) independence of the test data—were independently extracted by two reviewers, and adherence was calculated for each item.
Results:
Of 159 studies, 100% (159/159) reported the name of the LLM, 96.9% (154/159) reported the version, and 91.8% (146/159) reported the manufacturer. However, only 54.1% (86/159) reported the training data cutoff date, 6.3% (10/159) documented access to web-based information, and 50.9% (81/159) provided the date of the query attempts. Clear documentation regarding stochasticity management was provided in 15.1% (24/159) of the studies. Regarding prompt details, 49.1% (78/159) provided exact prompt wording and syntax but only 34.0% (54/159) documented prompt-structuring practices. While 46.5% (74/159) of the studies detailed prompt testing, only 15.7% (25/159) explained the rationale for specific word choices. Test data independence was reported for only 13.2% (21/159) of the studies, and 56.6% (43/76) provided URLs for internet-sourced test data.
Conclusion
Although basic LLM identification details were relatively well reported, other key aspects, including stochasticity, prompts, and test data, were frequently underreported. Enhancing adherence to the MI-CLEAR-LLM checklist will allow LLM research to achieve greater transparency and will foster more credible and reliable future studies.
4.Adherence of Studies on Large Language Models for Medical Applications Published in Leading Medical Journals According to the MI-CLEAR-LLM Checklist
Ji Su KO ; Hwon HEO ; Chong Hyun SUH ; Jeho YI ; Woo Hyun SHIM
Korean Journal of Radiology 2025;26(4):304-312
Objective:
To evaluate the adherence of large language model (LLM)-based healthcare research to the Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM) checklist, a framework designed to enhance the transparency and reproducibility of studies on the accuracy of LLMs for medical applications.
Materials and Methods:
A systematic PubMed search was conducted to identify articles on LLM performance published in high-ranking clinical medicine journals (the top 10% in each of the 59 specialties according to the 2023 Journal Impact Factor) from November 30, 2022, through June 25, 2024. Data on the six MI-CLEAR-LLM checklist items: 1) identification and specification of the LLM used, 2) stochasticity handling, 3) prompt wording and syntax, 4) prompt structuring, 5) prompt testing and optimization, and 6) independence of the test data—were independently extracted by two reviewers, and adherence was calculated for each item.
Results:
Of 159 studies, 100% (159/159) reported the name of the LLM, 96.9% (154/159) reported the version, and 91.8% (146/159) reported the manufacturer. However, only 54.1% (86/159) reported the training data cutoff date, 6.3% (10/159) documented access to web-based information, and 50.9% (81/159) provided the date of the query attempts. Clear documentation regarding stochasticity management was provided in 15.1% (24/159) of the studies. Regarding prompt details, 49.1% (78/159) provided exact prompt wording and syntax but only 34.0% (54/159) documented prompt-structuring practices. While 46.5% (74/159) of the studies detailed prompt testing, only 15.7% (25/159) explained the rationale for specific word choices. Test data independence was reported for only 13.2% (21/159) of the studies, and 56.6% (43/76) provided URLs for internet-sourced test data.
Conclusion
Although basic LLM identification details were relatively well reported, other key aspects, including stochasticity, prompts, and test data, were frequently underreported. Enhancing adherence to the MI-CLEAR-LLM checklist will allow LLM research to achieve greater transparency and will foster more credible and reliable future studies.
5.Adherence of Studies on Large Language Models for Medical Applications Published in Leading Medical Journals According to the MI-CLEAR-LLM Checklist
Ji Su KO ; Hwon HEO ; Chong Hyun SUH ; Jeho YI ; Woo Hyun SHIM
Korean Journal of Radiology 2025;26(4):304-312
Objective:
To evaluate the adherence of large language model (LLM)-based healthcare research to the Minimum Reporting Items for Clear Evaluation of Accuracy Reports of Large Language Models in Healthcare (MI-CLEAR-LLM) checklist, a framework designed to enhance the transparency and reproducibility of studies on the accuracy of LLMs for medical applications.
Materials and Methods:
A systematic PubMed search was conducted to identify articles on LLM performance published in high-ranking clinical medicine journals (the top 10% in each of the 59 specialties according to the 2023 Journal Impact Factor) from November 30, 2022, through June 25, 2024. Data on the six MI-CLEAR-LLM checklist items: 1) identification and specification of the LLM used, 2) stochasticity handling, 3) prompt wording and syntax, 4) prompt structuring, 5) prompt testing and optimization, and 6) independence of the test data—were independently extracted by two reviewers, and adherence was calculated for each item.
Results:
Of 159 studies, 100% (159/159) reported the name of the LLM, 96.9% (154/159) reported the version, and 91.8% (146/159) reported the manufacturer. However, only 54.1% (86/159) reported the training data cutoff date, 6.3% (10/159) documented access to web-based information, and 50.9% (81/159) provided the date of the query attempts. Clear documentation regarding stochasticity management was provided in 15.1% (24/159) of the studies. Regarding prompt details, 49.1% (78/159) provided exact prompt wording and syntax but only 34.0% (54/159) documented prompt-structuring practices. While 46.5% (74/159) of the studies detailed prompt testing, only 15.7% (25/159) explained the rationale for specific word choices. Test data independence was reported for only 13.2% (21/159) of the studies, and 56.6% (43/76) provided URLs for internet-sourced test data.
Conclusion
Although basic LLM identification details were relatively well reported, other key aspects, including stochasticity, prompts, and test data, were frequently underreported. Enhancing adherence to the MI-CLEAR-LLM checklist will allow LLM research to achieve greater transparency and will foster more credible and reliable future studies.
6.Licochalcone D Inhibits Skin Epidermal Cells Transformation through the Regulation of AKT Signaling Pathways
Sun-Young HWANG ; Kwanhwan WI ; Goo YOON ; Cheol-Jung LEE ; Soong-In LEE ; Jong-gil JUNG ; Hyun-Woo JEONG ; Jeong-Sang KIM ; Chan-Heon CHOI ; Chang-Su NA ; Jung-Hyun SHIM ; Mee-Hyun LEE
Biomolecules & Therapeutics 2023;31(6):682-691
Cell transformation induced by epidermal growth factor (EGF) and 12-O-tetradecanoylphorbol-13-acetate (TPA) is a critical event in cancer initiation and progression, and understanding the underlying mechanisms is essential for the development of new therapeutic strategies. Licorice extract contains various bioactive compounds, which have been reported to have anticancer and anti-inflammatory effects. This study investigated the cancer preventive efficacy of licochalcone D (LicoD), a chalcone derivative in licorice extract, in EGF and TPA-induced transformed skin keratinocyte cells. LicoD effectively suppressed EGF-induced cell proliferation and anchorage-independent colony growth. EGF and TPA promoted the S phase of cell cycle, while LicoD treatment caused G1 phase arrest and down-regulated cyclin D1 and up-regulated p21 expression associated with the G1 phase. LicoD also induced apoptosis and increased apoptosis-related proteins such as cleaved-caspase-3, cleaved-caspase-7, and Bax (Bcl-2-associated X protein). We further investigated the effect of LicoD on the AKT signaling pathway involved in various cellular processes and found decreased p-AKT, p-GSK3β, and p-NFκB expression. Treatment with MK-2206, an AKT pharmacological inhibitor, suppressed EGF-induced cell proliferation and transformed colony growth. In conclusion, this study demonstrated the potential of LicoD as a preventive agent for skin carcinogenesis.
7.Diagnostic Yield of Diffusion-Weighted Brain Magnetic Resonance Imaging in Patients with Transient Global Amnesia: A Systematic Review and Meta-Analysis
Su Jin LIM ; Minjae KIM ; Chong Hyun SUH ; Sang Yeong KIM ; Woo Hyun SHIM ; Sang Joon KIM
Korean Journal of Radiology 2021;22(10):1680-1689
Objective:
To investigate the diagnostic yield of diffusion-weighted imaging (DWI) in patients with transient global amnesia (TGA) and identify significant parameters affecting diagnostic yield.
Materials and Methods:
A systematic literature search of the MEDLINE and EMBASE databases was conducted to identify studies that assessed the diagnostic yield of DWI in patients with TGA. The pooled diagnostic yield of DWI in patients with TGA was calculated using the DerSimonian-Laird random-effects model. Subgroup analyses were also performed of slice thickness, magnetic field strength, and interval between symptom onset and DWI.
Results:
Twenty-two original articles (1732 patients) were included. The pooled incidence of right, left, and bilateral hippocampal lesions was 37% (95% confidence interval [CI], 30–44%), 42% (95% CI, 39–46%), and 25% (95% CI, 20–30%) of all lesions, respectively. The pooled diagnostic yield of DWI in patients with TGA was 39% (95% CI, 27–52%). The Higgins I2 statistic showed significant heterogeneity (I2 = 95%). DWI with a slice thickness ≤ 3 mm showed a higher diagnostic yield than DWI with a slice thickness > 3 mm (pooled diagnostic yield: 63% [95% CI, 53–72%] vs. 26% [95% CI, 16–40%], p < 0.01). DWI performed at an interval between 24 and 96 hours after symptom onset showed a higher diagnostic yield (68% [95% CI, 57–78%], p < 0.01) than DWI performed within 24 hours (16% [95% CI, 7–34%]) or later than 96 hours (15% [95% CI, 8–26%]). There was no difference in the diagnostic yield between DWI performed using 3T vs. 1.5T (pooled diagnostic yield, 31% [95% CI, 25–38%] vs. 24% [95% CI, 14–37%], p = 0.31).
Conclusion
The pooled diagnostic yield of DWI in TGA patients was 39%. DWI obtained with a slice thickness ≤ 3 mm or an interval between symptom onset and DWI of > 24 to 96 hours could increase the diagnostic yield.
8.Knot-Tying versus Knotless Suture Anchors for Arthroscopic Bankart Repair: A Comparative Study
Jae Woo SHIM ; Tae Wan JUNG ; Il Su KIM ; Jae Chul YOO
Yonsei Medical Journal 2021;62(8):743-749
Purpose:
This study aimed to compare the results of using knotless and knot-tying suture anchors in arthroscopic Bankart repair.
Materials and Methods:
The patients who underwent arthroscopic Bankart repair between 2011 and 2017 using knot-tying and knotless suture anchors were retrospectively reviewed. We collected demographic data, clinical scores (pain visual analogue scale), functional visual analogue scale, American Shoulder and Elbow Society scores, and Rowe score), and range of motion (ROM). Re-dislocation and subjective anterior apprehension test rates between the two techniques were also analyzed.
Results:
Of the 154 patients who underwent arthroscopic Bankart repair, 115 patients (knot-tying group: n=61 and knotless group: n=54) were included in this study. Of the 115 patients, 102 were male and 13 were female. The mean patient age was 27 years (range: 17–60), and the mean follow-up period was 43 months (range: 24–99). There were no significant differences in the final clinical scores and ROM between the two groups. Re-dislocation was observed in 6 (9.8%) and 4 (7.3%) patients in the knot-tying and knotless groups, respectively. Apprehension was observed in 11 (18.0%) and 12 (22.2%) patients in the knot-tying and knotless groups, respectively. There were no significant differences between the two groups in regards to re-dislocation and anterior apprehension.
Conclusion
Re-dislocation rates and clinical scores were similar with the use of knotless and knot-tying suture anchors in arthroscopic Bankart repair after a minimal 2 year follow-up.
9.Diagnostic Yield of Diffusion-Weighted Brain Magnetic Resonance Imaging in Patients with Transient Global Amnesia: A Systematic Review and Meta-Analysis
Su Jin LIM ; Minjae KIM ; Chong Hyun SUH ; Sang Yeong KIM ; Woo Hyun SHIM ; Sang Joon KIM
Korean Journal of Radiology 2021;22(10):1680-1689
Objective:
To investigate the diagnostic yield of diffusion-weighted imaging (DWI) in patients with transient global amnesia (TGA) and identify significant parameters affecting diagnostic yield.
Materials and Methods:
A systematic literature search of the MEDLINE and EMBASE databases was conducted to identify studies that assessed the diagnostic yield of DWI in patients with TGA. The pooled diagnostic yield of DWI in patients with TGA was calculated using the DerSimonian-Laird random-effects model. Subgroup analyses were also performed of slice thickness, magnetic field strength, and interval between symptom onset and DWI.
Results:
Twenty-two original articles (1732 patients) were included. The pooled incidence of right, left, and bilateral hippocampal lesions was 37% (95% confidence interval [CI], 30–44%), 42% (95% CI, 39–46%), and 25% (95% CI, 20–30%) of all lesions, respectively. The pooled diagnostic yield of DWI in patients with TGA was 39% (95% CI, 27–52%). The Higgins I2 statistic showed significant heterogeneity (I2 = 95%). DWI with a slice thickness ≤ 3 mm showed a higher diagnostic yield than DWI with a slice thickness > 3 mm (pooled diagnostic yield: 63% [95% CI, 53–72%] vs. 26% [95% CI, 16–40%], p < 0.01). DWI performed at an interval between 24 and 96 hours after symptom onset showed a higher diagnostic yield (68% [95% CI, 57–78%], p < 0.01) than DWI performed within 24 hours (16% [95% CI, 7–34%]) or later than 96 hours (15% [95% CI, 8–26%]). There was no difference in the diagnostic yield between DWI performed using 3T vs. 1.5T (pooled diagnostic yield, 31% [95% CI, 25–38%] vs. 24% [95% CI, 14–37%], p = 0.31).
Conclusion
The pooled diagnostic yield of DWI in TGA patients was 39%. DWI obtained with a slice thickness ≤ 3 mm or an interval between symptom onset and DWI of > 24 to 96 hours could increase the diagnostic yield.
10.Knot-Tying versus Knotless Suture Anchors for Arthroscopic Bankart Repair: A Comparative Study
Jae Woo SHIM ; Tae Wan JUNG ; Il Su KIM ; Jae Chul YOO
Yonsei Medical Journal 2021;62(8):743-749
Purpose:
This study aimed to compare the results of using knotless and knot-tying suture anchors in arthroscopic Bankart repair.
Materials and Methods:
The patients who underwent arthroscopic Bankart repair between 2011 and 2017 using knot-tying and knotless suture anchors were retrospectively reviewed. We collected demographic data, clinical scores (pain visual analogue scale), functional visual analogue scale, American Shoulder and Elbow Society scores, and Rowe score), and range of motion (ROM). Re-dislocation and subjective anterior apprehension test rates between the two techniques were also analyzed.
Results:
Of the 154 patients who underwent arthroscopic Bankart repair, 115 patients (knot-tying group: n=61 and knotless group: n=54) were included in this study. Of the 115 patients, 102 were male and 13 were female. The mean patient age was 27 years (range: 17–60), and the mean follow-up period was 43 months (range: 24–99). There were no significant differences in the final clinical scores and ROM between the two groups. Re-dislocation was observed in 6 (9.8%) and 4 (7.3%) patients in the knot-tying and knotless groups, respectively. Apprehension was observed in 11 (18.0%) and 12 (22.2%) patients in the knot-tying and knotless groups, respectively. There were no significant differences between the two groups in regards to re-dislocation and anterior apprehension.
Conclusion
Re-dislocation rates and clinical scores were similar with the use of knotless and knot-tying suture anchors in arthroscopic Bankart repair after a minimal 2 year follow-up.

Result Analysis
Print
Save
E-mail