2.Plasma fractionation in Korea: working towards self-sufficiency.
Quehn PARK ; Moon Jung KIM ; Jaeseung LEE ; Sunmi SHIN
Korean Journal of Hematology 2010;45(1):3-5
No abstract available.
Plasma
4.Image Quality and Lesion Detectability of Lower-Dose Abdominopelvic CT Obtained Using Deep Learning Image Reconstruction
June PARK ; Jaeseung SHIN ; In Kyung MIN ; Heejin BAE ; Yeo-Eun KIM ; Yong Eun CHUNG
Korean Journal of Radiology 2022;23(4):402-412
Objective:
To evaluate the image quality and lesion detectability of lower-dose CT (LDCT) of the abdomen and pelvis obtained using a deep learning image reconstruction (DLIR) algorithm compared with those of standard-dose CT (SDCT) images.
Materials and Methods:
This retrospective study included 123 patients (mean age ± standard deviation, 63 ± 11 years;male:female, 70:53) who underwent contrast-enhanced abdominopelvic LDCT between May and August 2020 and had prior SDCT obtained using the same CT scanner within a year. LDCT images were reconstructed with hybrid iterative reconstruction (h-IR) and DLIR at medium and high strengths (DLIR-M and DLIR-H), while SDCT images were reconstructed with h-IR. For quantitative image quality analysis, image noise, signal-to-noise ratio, and contrast-to-noise ratio were measured in the liver, muscle, and aorta. Among the three different LDCT reconstruction algorithms, the one showing the smallest difference in quantitative parameters from those of SDCT images was selected for qualitative image quality analysis and lesion detectability evaluation. For qualitative analysis, overall image quality, image noise, image sharpness, image texture, and lesion conspicuity were graded using a 5-point scale by two radiologists. Observer performance in focal liver lesion detection was evaluated by comparing the jackknife free-response receiver operating characteristic figures-of-merit (FOM).
Results:
LDCT (35.1% dose reduction compared with SDCT) images obtained using DLIR-M showed similar quantitative measures to those of SDCT with h-IR images. All qualitative parameters of LDCT with DLIR-M images but image texture were similar to or significantly better than those of SDCT with h-IR images. The lesion detectability on LDCT with DLIR-M images was not significantly different from that of SDCT with h-IR images (reader-averaged FOM, 0.887 vs. 0.874, respectively; p = 0.581).
Conclusion
Overall image quality and detectability of focal liver lesions is preserved in contrast-enhanced abdominopelvic LDCT obtained with DLIR-M relative to those in SDCT with h-IR.
5.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
6.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
7.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
8.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
9.Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions
Taewon HAN ; Woo Kyoung JEONG ; Jaeseung SHIN
Ultrasonography 2025;44(3):220-231
Purpose:
This study aimed to evaluate the diagnostic accuracy of three multimodal large language models (LLMs) in radiological image interpretation and to assess the impact of prompt engineering strategies and input conditions.
Methods:
This study analyzed 67 radiological quiz cases from the Korean Society of Ultrasound in Medicine. Three multimodal LLMs (Claude 3.5 Sonnet, GPT-4o, and Gemini-1.5-Pro-002) were evaluated using six types of prompts (basic [without system prompt], original [specific instructions], chain-of-thought, reflection, multiagent, and artificial intelligence [AI]–generated). Performance was assessed across various factors, including tumor versus non-tumor status, case rarity, difficulty, and knowledge cutoff dates. A subgroup analysis compared diagnostic accuracy between imaging-only inputs and combined imaging-descriptive text inputs.
Results:
With imaging-only inputs, Claude 3.5 Sonnet achieved the highest overall accuracy (46.3%, 186/402), followed by GPT-4o (43.5%, 175/402) and Gemini-1.5-Pro-002 (39.8%, 160/402). AI-generated prompts yielded superior combined accuracy across all three models, with significant improvements over the basic (7.96%, P=0.009), chain-of-thought (6.47%, P=0.029), and multiagent prompts (5.97%, P=0.043). The integration of descriptive text significantly enhanced diagnostic accuracy for Claude 3.5 Sonnet (46.3% to 66.2%, P<0.001), GPT-4o (43.5% to 57.5%, P<0.001), and Gemini-1.5-Pro-002 (39.8% to 60.4%, P<0.001). Model performance was significantly influenced by case rarity for GPT-4o (rare: 6.7% vs. nonrare: 53.9%, P=0.001) and by knowledge cutoff dates for Claude 3.5 Sonnet (post-cutoff: 23.5% vs. pre-cutoff: 64.0%, P=0.005).
Conclusion
Claude 3.5 Sonnet achieved the highest diagnostic accuracy in radiological quiz cases, followed by GPT-4o and Gemini-1.5-Pro-002. The use of AI-generated prompts and the integration of descriptive text inputs enhanced model performance.
10.Construction of a Standard Dataset for Liver Tumors for Testing the Performance and Safety of Artificial Intelligence-Based Clinical Decision Support Systems
Seung-seob KIM ; Dong Ho LEE ; Min Woo LEE ; So Yeon KIM ; Jaeseung SHIN ; Jin‑Young CHOI ; Byoung Wook CHOI
Journal of the Korean Radiological Society 2021;82(5):1196-1206
Purpose:
To construct a standard dataset of contrast-enhanced CT images of liver tumors to test the performance and safety of artificial intelligence (AI)-based algorithms for clinical decision support systems (CDSSs).
Materials and Methods:
A consensus group of medical experts in gastrointestinal radiology from four national tertiary institutions discussed the conditions to be included in a standard dataset. Seventy-five cases of hepatocellular carcinoma, 75 cases of metastasis, and 30–50 cases of benign lesions were retrieved from each institution, and the final dataset consisted of 300 cases of hepatocellular carcinoma, 300 cases of metastasis, and 183 cases of benign lesions.Only pathologically confirmed cases of hepatocellular carcinomas and metastases were enrolled. The medical experts retrieved the medical records of the patients and manually labeled the CT images. The CT images were saved as Digital Imaging and Communications in Medicine (DICOM) files.
Results:
The medical experts in gastrointestinal radiology constructed the standard dataset of contrast-enhanced CT images for 783 cases of liver tumors. The performance and safety of the AI algorithm can be evaluated by calculating the sensitivity and specificity for detecting and characterizing the lesions.
Conclusion
The constructed standard dataset can be utilized for evaluating the machine-learningbased AI algorithm for CDSS.