1.CRAKUT:integrating contrastive regional attention and clinical prior knowledge in U-transformer for radiology report generation.
Yedong LIANG ; Xiongfeng ZHU ; Meiyan HUANG ; Wencong ZHANG ; Hanyu GUO ; Qianjin FENG
Journal of Southern Medical University 2025;45(6):1343-1352
OBJECTIVES:
We propose a Contrastive Regional Attention and Prior Knowledge-Infused U-Transformer model (CRAKUT) to address the challenges of imbalanced text distribution, lack of contextual clinical knowledge, and cross-modal information transformation to enhance the quality of generated radiology reports.
METHODS:
The CRAKUT model comprises 3 key components, including an image encoder that utilizes common normal images from the dataset for extracting enhanced visual features, an external knowledge infuser that incorporates clinical prior knowledge, and a U-Transformer that facilitates cross-modal information conversion from vision to language. The contrastive regional attention in the image encoder was introduced to enhance the features of abnormal regions by emphasizing the difference between normal and abnormal semantic features. Additionally, the clinical prior knowledge infuser within the text encoder integrates clinical history and knowledge graphs generated by ChatGPT. Finally, the U-Transformer was utilized to connect the multi-modal encoder and the report decoder in a U-connection schema, and multiple types of information were used to fuse and obtain the final report.
RESULTS:
We evaluated the proposed CRAKUT model on two publicly available CXR datasets (IU-Xray and MIMIC-CXR). The experimental results showed that the CRAKUT model achieved a state-of-the-art performance on report generation with a BLEU-4 score of 0.159, a ROUGE-L score of 0.353, and a CIDEr score of 0.500 in MIMIC-CXR dataset; the model also had a METEOR score of 0.258 in IU-Xray dataset, outperforming all the comparison models.
CONCLUSIONS
The proposed method has great potential for application in clinical disease diagnoses and report generation.
Humans
;
Radiology Information Systems
;
Radiology
2.Design and validation of a multimodal model integrating text and imaging data for intelligent assessment of psychological stress in college students.
Huirong XIE ; Chaobin HU ; Guohua LIANG ; Hongzhe HAN ; Mu HUANG ; Qianjin FENG
Journal of Southern Medical University 2025;45(11):2504-2510
OBJECTIVES:
We propose a multimodal model integrating social media text and image data for automated assessment of psychological stress in college students to support the development of intelligent mental health services in higher education institutions.
METHODS:
Based on deep learning technology, we designed an evaluation framework comprising a text sentiment modeling module, an image sentiment modeling module, and a multimodal fusion prediction module. Text sentiment features were extracted using Bi-LSTM, and image semantic cues were extracted via U-Net. A feature concatenation strategy was used to enable cross-modal semantic collaboration to achieve automatic identification of 3 psychological stress levels: mild, moderate, and severe. We constructed a multimodal annotated dataset using social platform data from 1577 students across multiple universities in Guangdong Province. After data cleaning, 252 samples were randomly selected for model training and testing.
RESULTS:
In the 3-classification task, the model demonstrated outstanding performance on the test set, and achieved an accuracy of 92.86% and an F1 score of 0.9276, exhibiting excellent stability and consistency. Confusion matrix analysis further revealed the model's ability to effectively distinguish between different pressure levels.
CONCLUSIONS
The multimodal psychological stress assessment model developed in this study effectively integrates unstructured social behavior data to enhance the scientific rigor and practical applicability of psychological state recognition, and thus provides support for developing intelligent psychological service systems.
Humans
;
Stress, Psychological/diagnosis*
;
Students/psychology*
;
Universities
;
Social Media
;
Deep Learning
3.Construction of clinical research database using EDC system key technology
Chinese Journal of Medical Physics 2025;42(1):135-140
The construction of standardized and high-quality clinical research database is an important link for clinicians or researchers to carry out clinical medical research,and also an important guarantee for obtaining high-quality scientific research results and publication. A review on the types and challenges of traditional clinical research database creation is provided,focusing on the key technical solutions of constructing high-quality clinical research database based on electronic data capture (EDC) system,including 6 modules of technological innovation program:the combination of interactive web response system and EDC system,logical check,intelligent follow-up management,data dictionary,trace tracing,statistical analysis of research data,so as to solve the deficiency of traditional software system in constructing clinical research database. In addition,the applications of EDC system in clinical research database construction are summarized and its development prospects are discussed for providing references for clinicians or researchers to carry out high-quality clinical researches.
4.Pancreas segmentation algorithm based on depth-wise convolution and tri-orientated spatial attention
Chinese Journal of Medical Physics 2025;42(1):37-42
A cascaded 3D pancreas segmentation network (CPS-Net) is proposed to address the challenges in pancreas segmentation caused by its small size and complex anatomical structure. CPS-Net is composed of two components:the first part utilizes ResUNet to quickly localize the pancreas region,while the second part uses a network that fuses depth-wise convolution block and tri-orientated spatial attention module to refine the segmentation results. Specifically,depth-wise convolution block significantly enhances the differentiation between the pancreas and surrounding tissues by extracting multi-scale features layer by layer,while tri-orientated spatial attention module combines axial attention,planar attention and window attention mechanisms to comprehensively capture the detailed structure of the pancreas in a complex background. CPS-Net achieved Dice similarity coefficient,positive predictive value,sensitivity,and Hausdorff distance of 87.42%±1.58%,87.42%±3.52%,87.74%±4.58%,and (0.22±0.08) mm,respectively,on the NIH public dataset,demonstrating its higher pancreas segmentation accuracy and superior performance compared with the current state-of-the-art segmentation networks.
5.Construction of clinical research database using EDC system key technology
Chinese Journal of Medical Physics 2025;42(1):135-140
The construction of standardized and high-quality clinical research database is an important link for clinicians or researchers to carry out clinical medical research,and also an important guarantee for obtaining high-quality scientific research results and publication. A review on the types and challenges of traditional clinical research database creation is provided,focusing on the key technical solutions of constructing high-quality clinical research database based on electronic data capture (EDC) system,including 6 modules of technological innovation program:the combination of interactive web response system and EDC system,logical check,intelligent follow-up management,data dictionary,trace tracing,statistical analysis of research data,so as to solve the deficiency of traditional software system in constructing clinical research database. In addition,the applications of EDC system in clinical research database construction are summarized and its development prospects are discussed for providing references for clinicians or researchers to carry out high-quality clinical researches.
6.Pancreas segmentation algorithm based on depth-wise convolution and tri-orientated spatial attention
Chinese Journal of Medical Physics 2025;42(1):37-42
A cascaded 3D pancreas segmentation network (CPS-Net) is proposed to address the challenges in pancreas segmentation caused by its small size and complex anatomical structure. CPS-Net is composed of two components:the first part utilizes ResUNet to quickly localize the pancreas region,while the second part uses a network that fuses depth-wise convolution block and tri-orientated spatial attention module to refine the segmentation results. Specifically,depth-wise convolution block significantly enhances the differentiation between the pancreas and surrounding tissues by extracting multi-scale features layer by layer,while tri-orientated spatial attention module combines axial attention,planar attention and window attention mechanisms to comprehensively capture the detailed structure of the pancreas in a complex background. CPS-Net achieved Dice similarity coefficient,positive predictive value,sensitivity,and Hausdorff distance of 87.42%±1.58%,87.42%±3.52%,87.74%±4.58%,and (0.22±0.08) mm,respectively,on the NIH public dataset,demonstrating its higher pancreas segmentation accuracy and superior performance compared with the current state-of-the-art segmentation networks.
7.A multi-constraint optimal puncture path planning algorithm for percutaneous interventional radiofrequency thermal fusion of the L5/S1 segments
Hu LIU ; Zhihai SU ; Chengjie HUANG ; Lei ZHAO ; Yangfan CHEN ; Yujia ZHOU ; Hai LÜ ; Qianjin FENG
Journal of Southern Medical University 2024;44(9):1783-1795
Objective To minimize variations in treatment outcomes of L5/S1 percutaneous intervertebral radiofrequency thermocoagulation(PIRFT)arising from physician proficiency and achieve precise quantitative risk assessment of the puncture paths.Methods We used a self-developed deep neural network DWT-UNet for automatic segmentation of the magnetic resonance(MR)images of the L5/S1 segments into 7 key structures:L5,S1,Ilium,Disc,N5,Dura mater,and Skin,based on which a needle insertion path planning environment was modeled.Six hard constraints and 6 soft constraints were proposed based on clinical criteria for needle insertion,and the physician's experience was quantified into weights using the analytic hierarchy process and incorporated into the risk function for needle insertion paths to enhance individual case adaptability.By leveraging the proposed skin entry point sampling sub-algorithm and Kambin's triangle projection area sub-algorithm in conjunction with the analytic hierarchy process,and employing various technologies such as ray tracing,CPU multi-threading,and GPU parallel computing,a puncture path was calculated that not only met clinical hard constraints but also optimized the overall soft constraints.Results A surgical team conducted a subjective evaluation of the 21 needle puncture paths planned by the algorithm,and all the paths met the clinical requirements,with 95.24%of them rated excellent or good.Compared with the physician's planning results,the plans generated by the algorithm showed inferior DIlium,DS1,and Depth(P<0.05)but much better DDura,DL5,DN5,and AKambin(P<0.05).In the 21 cases,the planning time of the algorithm averaged 7.97±3.73 s,much shorter than that by the physicians(typically beyond 10 min).Conclusion The multi-constraint optimal puncture path planning algorithm offers an efficient automated solution for PIRFT of the L5/S1 segments with great potentials for clinical application.
8.A multi-constraint optimal puncture path planning algorithm for percutaneous interventional radiofrequency thermal fusion of the L5/S1 segments
Hu LIU ; Zhihai SU ; Chengjie HUANG ; Lei ZHAO ; Yangfan CHEN ; Yujia ZHOU ; Hai LÜ ; Qianjin FENG
Journal of Southern Medical University 2024;44(9):1783-1795
Objective To minimize variations in treatment outcomes of L5/S1 percutaneous intervertebral radiofrequency thermocoagulation(PIRFT)arising from physician proficiency and achieve precise quantitative risk assessment of the puncture paths.Methods We used a self-developed deep neural network DWT-UNet for automatic segmentation of the magnetic resonance(MR)images of the L5/S1 segments into 7 key structures:L5,S1,Ilium,Disc,N5,Dura mater,and Skin,based on which a needle insertion path planning environment was modeled.Six hard constraints and 6 soft constraints were proposed based on clinical criteria for needle insertion,and the physician's experience was quantified into weights using the analytic hierarchy process and incorporated into the risk function for needle insertion paths to enhance individual case adaptability.By leveraging the proposed skin entry point sampling sub-algorithm and Kambin's triangle projection area sub-algorithm in conjunction with the analytic hierarchy process,and employing various technologies such as ray tracing,CPU multi-threading,and GPU parallel computing,a puncture path was calculated that not only met clinical hard constraints but also optimized the overall soft constraints.Results A surgical team conducted a subjective evaluation of the 21 needle puncture paths planned by the algorithm,and all the paths met the clinical requirements,with 95.24%of them rated excellent or good.Compared with the physician's planning results,the plans generated by the algorithm showed inferior DIlium,DS1,and Depth(P<0.05)but much better DDura,DL5,DN5,and AKambin(P<0.05).In the 21 cases,the planning time of the algorithm averaged 7.97±3.73 s,much shorter than that by the physicians(typically beyond 10 min).Conclusion The multi-constraint optimal puncture path planning algorithm offers an efficient automated solution for PIRFT of the L5/S1 segments with great potentials for clinical application.
9.Identification of osteoid and chondroid matrix mineralization in primary bone tumors using a deep learning fusion model based on CT and clinical features: a multi-center retrospective study.
Caolin LIU ; Qingqing ZOU ; Menghong WANG ; Qinmei YANG ; Liwen SONG ; Zixiao LU ; Qianjin FENG ; Yinghua ZHAO
Journal of Southern Medical University 2024;44(12):2412-2420
METHODS:
We retrospectively collected CT scan data from 276 patients with pathologically confirmed primary bone tumors from 4 medical centers in Guangdong Province between January, 2010 and August, 2021. A convolutional neural network (CNN) was employed as the deep learning architecture. The optimal baseline deep learning model (R-Net) was determined through transfer learning, and an optimized model (S-Net) was obtained through algorithmic improvements. Multivariate logistic regression analysis was used to screen the clinical features such as sex, age, mineralization location, and pathological fractures, which were then connected with the imaging features to construct the deep learning fusion model (SC-Net). The diagnostic performance of the SC-Net model and machine learning models were compared with radiologists' diagnoses, and their classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) and F1 score.
RESULTS:
In the external test set, the fusion model (SC-Net) achieved the best performance with an AUC of 0.901 (95% CI: 0.803-1.00), an accuracy of 83.7% (95% CI: 69.3%-93.2%) and an F1 score of 0.857, and outperformed the S-Net model with an AUC of 0.818 (95% CI: 0.694-0.942), an accuracy of 76.7% (95% CI: 61.4%-88.2%), and an F1 score of 0.828. The overall classification performance of the fusion model (SC-Net) exceeded that of radiologists' diagnoses.
CONCLUSIONS
The deep learning fusion model based on multi-center CT images and clinical features is capable of accurate classification of osseous and chondroid matrix mineralization and may potentially improve the accuracy of clinical diagnoses of osteogenic versus chondrogenic primary bone tumors.
Humans
;
Deep Learning
;
Bone Neoplasms/diagnostic imaging*
;
Retrospective Studies
;
Tomography, X-Ray Computed/methods*
;
Neural Networks, Computer
;
Male
;
Female
;
ROC Curve
;
Algorithms
10.Deep learning-based dose prediction in radiotherapy planning for head and neck cancer.
Lin TENG ; Bin WANG ; Qianjin FENG
Journal of Southern Medical University 2023;43(6):1010-1016
OBJECTIVE:
To propose an deep learning-based algorithm for automatic prediction of dose distribution in radiotherapy planning for head and neck cancer.
METHODS:
We propose a novel beam dose decomposition learning (BDDL) method designed on a cascade network. The delivery matter of beam through the planning target volume (PTV) was fitted with the pre-defined beam angles, which served as an input to the convolution neural network (CNN). The output of the network was decomposed into multiple sub-fractions of dose distribution along the beam directions to carry out a complex task by performing multiple simpler sub-tasks, thus allowing the model more focused on extracting the local features. The subfractions of dose distribution map were merged into a distribution map using the proposed multi-voting mechanism. We also introduced dose distribution features of the regions-of-interest (ROIs) and boundary map as the loss function during the training phase to serve as constraining factors of the network when extracting features of the ROIs and areas of dose boundary. Public datasets of radiotherapy planning for head and neck cancer were used for obtaining the accuracy of dose distribution of the BDDL method and for implementing the ablation study of the proposed method.
RESULTS:
The BDDL method achieved a Dose score of 2.166 and a DVH score of 1.178 (P < 0.05), demonstrating its superior prediction accuracy to that of current state-ofthe-art (SOTA) methods. Compared with the C3D method, which was in the first place in OpenKBP-2020 Challenge, the BDDL method improved the Dose score and DVH score by 26.3% and 30%, respectively. The results of the ablation study also demonstrated the effectiveness of each key component of the BDDL method.
CONCLUSION
The BDDL method utilizes the prior knowledge of the delivery matter of beam and dose distribution in the ROIs to establish a dose prediction model. Compared with the existing methods, the proposed method is interpretable and reliable and can be potentially applied in clinical radiotherapy.
Humans
;
Deep Learning
;
Head and Neck Neoplasms/radiotherapy*
;
Algorithms
;
Neural Networks, Computer

Result Analysis
Print
Save
E-mail