NLUS-VQA: construction and evaluation of a visual question answering model for neonatal lung ultrasound diagnosis
10.3760/cma.j.cn113903-20250628-00344
- VernacularTitle:NLUS-VQA:面向新生儿肺脏超声诊断的视觉问答大模型构建与评估
- Author:
Xuming TONG
1
;
Jiangang CHEN
;
Yiran WANG
;
Xiqing ZHAO
;
Yanhong YUAN
;
Zishuo WANG
;
Peng JIANG
;
Qingyao XIONG
;
Renxing LI
;
Xueli WANG
;
Jing LIU
Author Information
1. 河北北方学院信息科学与工程学院,张家口 075000
- Publication Type:Journal Article
- Keywords:
Neonate;
Lung ultrasound;
Visual question answering model;
Artificial intelligence;
LoRA fine-tuning
- From:
Chinese Journal of Perinatal Medicine
2025;28(11):917-928
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To develop and evaluate a medical visual question answering (VQA) model for neonatal lung ultrasound (LUS) images to enhance intelligent auxiliary diagnosis of neonatal pulmonary diseases.Methods:Using data from neonates admitted to Beijing Obstetrics and Gynecology Hospital, Capital Medical University (January 2023 to December 2024), an image-question-answer dataset comprising 251 LUS images was constructed [43 pneumonia (17.1%), 42 neonatal respiratory distress syndrome (16.7%), 83 transient tachypnea (33.1%), and 83 normal (33.1%) images] with a four-tier medical question-answer framework. Building upon the Qwen2.5-VL-7B base model and integrating LoRA fine-tuning with chain-of-thought prompting, we developed the NLUS-VQA model to enhance visual-language semantic alignment and enable stepwise clinical reasoning, achieving efficient small-sample adaptation. Model performance was comprehensively assessed through natural language generation metrics (BLEU-4, ROUGE-1/2/L), qualitative evaluation of characteristic recognition, and clinical consistency analysis.Results:(1) Quantitative evaluation demonstrated that NLUS-VQA achieved scores of 22.38 (BLEU-4), 48.26 (ROUGE-1), 22.40 (ROUGE-2), and 37.20 (ROUGE-L), representing significant improvements over baseline models. (2) Qualitatively, the model exhibited strong performance in identifying lung consolidation, coalescent B-lines, and snowflake signs, with its chain-of-thought strategy enhancing clinical interpretability and answer accuracy. (3) Clinically, NLUS-VQA achieved a Cohen's Kappa coefficient of 0.78 and diagnostic accuracy of 80.8% (21/26), indicating substantial agreement with clinical experts.Conclusion:The NLUS-VQA model demonstrates robust interpretability in recognizing key sonographic patterns (e.g. lung consolidation, confluent B-lines, and snowflake signs), providing a scalable framework for small-sample medical image analysis, though diagnostic performance on complex conditions remains limited by dataset scale and minority class representation.