1.Evaluating generic and domain-specific large visual models for T staging of esophageal cancer using CT:a study of zero-shot performance and the impact of prompt engineering
Dabing ZHU ; Wei GAO ; Yanghao LIN ; Wuhao LAI ; Zhichao LIANG ; Xianyi ZENG ; Xikai DENG ; Jun AN
Chinese Journal of Medical Physics 2025;42(11):1532-1540
Background Accurate T-staging is critical for esophageal cancer therapy,but CT-based assessment has significant limitations.Large vision models(LVMs)hold promise,yet their zero-shot clinical diagnostic capability without fine-tuning remains unvalidated.Methods A retrospective analysis was conducted on the chest CT images from 98 esophageal cancer patients and 50 normal controls.Using radiologist-consensus as the gold standard,the zero-shot T-staging performance of 3 LVMs(GPT-5,Gemini,and MedGemma)was evaluated with prompts of varying complexity.Results GPT-5 exhibited the highest accuracy and stability.Significant biases were observed among models:Gemini tended to over-stage,while MedGemma showed a tendency to under-stage.All models faced challenges in identifying early-stage tumors,but structured prompts improved diagnostic performance for mid-to-late stage lesions.Conclusion LVMs have potential for zero-shot T-staging,but their performance highly depends on model choice and prompt design.The generic model GPT-5 show superior zero-shot generalization.However,current model performance is not yet clinically viable,especially for early diagnosis.Future work should focus on fine-tuning with high-quality clinical data and developing standardized prompt frameworks.
2.Evaluating generic and domain-specific large visual models for T staging of esophageal cancer using CT:a study of zero-shot performance and the impact of prompt engineering
Dabing ZHU ; Wei GAO ; Yanghao LIN ; Wuhao LAI ; Zhichao LIANG ; Xianyi ZENG ; Xikai DENG ; Jun AN
Chinese Journal of Medical Physics 2025;42(11):1532-1540
Background Accurate T-staging is critical for esophageal cancer therapy,but CT-based assessment has significant limitations.Large vision models(LVMs)hold promise,yet their zero-shot clinical diagnostic capability without fine-tuning remains unvalidated.Methods A retrospective analysis was conducted on the chest CT images from 98 esophageal cancer patients and 50 normal controls.Using radiologist-consensus as the gold standard,the zero-shot T-staging performance of 3 LVMs(GPT-5,Gemini,and MedGemma)was evaluated with prompts of varying complexity.Results GPT-5 exhibited the highest accuracy and stability.Significant biases were observed among models:Gemini tended to over-stage,while MedGemma showed a tendency to under-stage.All models faced challenges in identifying early-stage tumors,but structured prompts improved diagnostic performance for mid-to-late stage lesions.Conclusion LVMs have potential for zero-shot T-staging,but their performance highly depends on model choice and prompt design.The generic model GPT-5 show superior zero-shot generalization.However,current model performance is not yet clinically viable,especially for early diagnosis.Future work should focus on fine-tuning with high-quality clinical data and developing standardized prompt frameworks.

Result Analysis
Print
Save
E-mail