Efficacy of CT-based interpretable integrated learning model for differentiating lung squamous cell carcinoma and adenocarcinoma
10.19745/j.1003-8868.2025118
- VernacularTitle:基于CT的可解释集成学习模型对肺鳞状细胞癌和腺癌的鉴别效能研究
- Author:
Shi-ze QIN
1
;
Xiu-fu ZHANG
;
Xue ZHOU
;
Dan SU
;
Yong-ying LIU
;
Fang WANG
;
Qing JIA
Author Information
1. 重庆市江津区中心医院放射科,重庆 402260
- Publication Type:Journal Article
- Keywords:
non-small cell lung carcinoma;
lung squamous cell carcinoma;
adenocarcinoma;
clinical indicator;
CT image feature;
radiomics feature;
machine Learning
- From:
Chinese Medical Equipment Journal
2025;46(7):12-20
- CountryChina
- Language:Chinese
-
Abstract:
Objective To investigate the efficacy of an interpretable integrated learning model combining clinical indicators,CT image features and radiomics features for the differential diagnosis of lung squamous cell carcinoma and adenocarcinoma,so as to provide references for clincal treatment decisions.Methods A retrospective analysis was conducted on clinical and imaging data from 220 patients(231 lesions)with primary non-small cell lung cancer at Jiangjin Central Hospital of Chongqing(Center 1)and 83 patients(84 lesions)at Chongqing General Hospital(Center 2).In Center 1,the squamous cell carcinoma group consisted of 60 patients(60 lesions),while the adenocarcinoma group included 160 patients(171 lesions).In Center 2,the squamous cell carcinoma group comprised 18 patients(18 lesions),and the adenocarcinoma group involved 65 patients(66 lesions).The patients were categorized into squamous cell carcinoma and adenocarcinoma groups based on pathological findings.Center 1 was randomly partitioned into a training set and a validation set at a 7∶3 ratio,while Center 2 served as the independent test set.Firstly,a deep learning model,VB-Net,was used to automatically segment the tumor region on the lung window image;secondly,the SMOTE(synthetic minority oversampling technique)method was used to balance the categories in the training set and standardize the extracted features with Z-scores;thirdly,the least absolute shrinkage and selection operator(LASSO)were used to select the optimal radiomics features and calculate the radiomics score(Radscore),and univariate and multivariate logistic regression was used to screen clinical indicators and independent clinical factors for differentiating lung squamous cell carcinoma and adenocarcinoma in CT image features;finally,three ensemble learning algorithms(AdaBoost,Bagging decision tree and XGBoost)were used to combine independent clinical factors and Radscore to construct the model.The receiver operating characteristic(ROC)curve was used to evaluate the diagnostic performance of the models.SHAP technique was used to analyze the feature contribution and model decision-making process.Results Among the evaluated ensemble models,AdaBoost and Bagging decision trees demonstrated overfitting tendencies.In contrast,the XGBoost model showed the best performance,achieving AUC values of 0.939,0.887 and 0.853 in the training,validation and independent test sets,respectively.SHAP indicated that Radscore was the most important feature affecting the performance of the model.The decision diagram enabled the visualization of the diagnostic process of the model.Conclusion The interpretable integrated learning model based on clinical indicators,CT image and radiomics features is expected to non-invasively diagnose lung squamous cell carcinoma and adenocarcinoma before treatment and assist clinicians make treatment decisions as early as possible.[Chinese Medical Equipment Journal,2025,46(7):12-20]