2.Plasma fractionation in Korea: working towards self-sufficiency.
Quehn PARK ; Moon Jung KIM ; Jaeseung LEE ; Sunmi SHIN
Korean Journal of Hematology 2010;45(1):3-5
No abstract available.
Plasma
4.Image Quality and Lesion Detectability of Lower-Dose Abdominopelvic CT Obtained Using Deep Learning Image Reconstruction
June PARK ; Jaeseung SHIN ; In Kyung MIN ; Heejin BAE ; Yeo-Eun KIM ; Yong Eun CHUNG
Korean Journal of Radiology 2022;23(4):402-412
Objective:
To evaluate the image quality and lesion detectability of lower-dose CT (LDCT) of the abdomen and pelvis obtained using a deep learning image reconstruction (DLIR) algorithm compared with those of standard-dose CT (SDCT) images.
Materials and Methods:
This retrospective study included 123 patients (mean age ± standard deviation, 63 ± 11 years;male:female, 70:53) who underwent contrast-enhanced abdominopelvic LDCT between May and August 2020 and had prior SDCT obtained using the same CT scanner within a year. LDCT images were reconstructed with hybrid iterative reconstruction (h-IR) and DLIR at medium and high strengths (DLIR-M and DLIR-H), while SDCT images were reconstructed with h-IR. For quantitative image quality analysis, image noise, signal-to-noise ratio, and contrast-to-noise ratio were measured in the liver, muscle, and aorta. Among the three different LDCT reconstruction algorithms, the one showing the smallest difference in quantitative parameters from those of SDCT images was selected for qualitative image quality analysis and lesion detectability evaluation. For qualitative analysis, overall image quality, image noise, image sharpness, image texture, and lesion conspicuity were graded using a 5-point scale by two radiologists. Observer performance in focal liver lesion detection was evaluated by comparing the jackknife free-response receiver operating characteristic figures-of-merit (FOM).
Results:
LDCT (35.1% dose reduction compared with SDCT) images obtained using DLIR-M showed similar quantitative measures to those of SDCT with h-IR images. All qualitative parameters of LDCT with DLIR-M images but image texture were similar to or significantly better than those of SDCT with h-IR images. The lesion detectability on LDCT with DLIR-M images was not significantly different from that of SDCT with h-IR images (reader-averaged FOM, 0.887 vs. 0.874, respectively; p = 0.581).
Conclusion
Overall image quality and detectability of focal liver lesions is preserved in contrast-enhanced abdominopelvic LDCT obtained with DLIR-M relative to those in SDCT with h-IR.
5.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
6.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
7.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
8.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
9.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
10.Contrast-enhanced ultrasound Liver Imaging Reporting and Data System category M: a systematic review and meta-analysis
Jaeseung SHIN ; Sunyoung LEE ; Yeun-Yoon KIM ; Yong Eun CHUNG ; Jin-Young CHOI ; Mi-Suk PARK
Ultrasonography 2022;41(1):74-82
Purpose:
A meta-analysis was conducted to determine the proportion of contrast-enhanced ultrasound (CEUS) Liver Imaging Reporting and Data System category M (LR-M) in hepatocellular carcinomas (HCCs) and non-HCC malignancies and to investigate the frequency of individual CEUS LR-M imaging features.
Methods:
The MEDLINE and Embase databases were searched from January 1, 2016 to July 23, 2020 for studies reporting the proportion of CEUS LR-M in HCC and non-HCC malignancies. The meta-analytic pooled proportions of HCC and non-HCC malignancies in the CEUS LR-M category were calculated. The meta-analytic frequencies of CEUS LR-M imaging features in nonHCC malignancies were also determined. Risk of bias and applicability were evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2 tool.
Results:
Twelve studies reporting the diagnostic performance of the CEUS LR-M category were identified, as well as seven studies reporting the frequencies of individual CEUS LR-M imaging features. The pooled proportions of HCC and non-HCC malignancies in the CEUS LR-M category were 54% (95% confidence interval [CI], 44% to 65%) and 40% (95% CI, 28% to 53%), respectively. The pooled frequencies of individual CEUS LR-M imaging features in non-HCC malignancies were 30% (95% CI, 17% to 45%) for rim arterial phase hyperenhancement, 79% (95% CI, 66% to 90%) for early (<60 s) washout, and 42% (95% CI, 21% to 64%) for marked washout.
Conclusion
In total, 94% of CEUS LR-M lesions were malignancies, with HCCs representing 54% and non-HCC malignancies representing 40%. The frequencies of individual CEUS LR-M imaging features varied; early washout showed the highest frequency for non-HCC malignancies.