Prediction of postoperative pulmonary complications in video-assisted thoracic surgery for lung cancer based on cardiopulmonary exercise testing and machine learning
- VernacularTitle:基于心肺运动试验与机器学习预测胸腔镜肺癌切除术后肺部并发症
- Author:
Lei GUO
1
,
2
;
Fusong LIU
3
;
Zhilong OU
4
;
Lan GUO
5
;
Tiantian LI
4
;
Chongfeng ZHOU
2
;
Kun LUAN
2
;
Xiaoman CHEN
4
;
Yucheng WEI
1
,
6
Author Information
1. Qingdao University Medical College, Qingdao, 266000, Shandong, P. R. China
2. Department of Thoracic Surgery, The Third People's Hospital of Qingdao University, Qingdao, 266045, Shandong, P. R. China
3. Department of Cardiology, The Third People's Hospital of Qingdao University, Qingdao, 266045, Shandong, P. R. China
4. Department of Oncology, The Third People's Hospital of Qingdao University, Qingdao, 266045, Shandong, P. R. China
5. Department of Cardiology, Guangdong Provincial People's Hospital, Guangzhou, 510080, P. R. China
6. Department of Thoracic Surgery, West Coast Hospital of Qingdao University, Qingdao, 266000, Shandong, P. R. China
- Publication Type:Journal Article
- Keywords:
Cardiopulmonary exercise testing;
machine learning;
video-assisted thoracic surgery;
postoperative pulmonary complications
- From:
Chinese Journal of Clinical Thoracic and Cardiovascular Surgery
2026;33(01):44-52
- CountryChina
- Language:Chinese
-
Abstract:
Objective To develop a predictive model for postoperative pulmonary complications (PPC) following video-assisted thoracic surgery (VATS) in lung cancer patients by integrating cardiopulmonary exercise testing (CPET) parameters and machine learning techniques. Methods A retrospective analysis was conducted on patients with early-stage non-small cell lung cancer who underwent CPET and VATS at Guangdong Provincial People’s Hospital between October 2021 and July 2023. Patients were divided into a PPC group and a non-PPC group. The least absolute shrinkage and selection operator (LASSO) regression was used to select important features associated with PPC. Six machine learning algorithms were utilized to construct prediction models, including logistic regression, support vector machine, k-nearest neighbors, random forest, gradient boosting machine, and extreme gradient boosting. The optimal model was interpreted using SHapley Additive exPlanations (SHAP). Results A total of 325 patients were included, with an average age of 60.36 years, and 55.1% were male. Significant differences were observed between the PPC and non-PPC groups in age, diabetes, coronary heart disease, surgical approach, forced expiratory volume in 1 second (FEV1), forced vital capacity (FVC), FVC% predicted, peak oxygen uptake (peak VO2), anaerobic threshold (AT), and ventilatory equivalent for carbon dioxide slope (VE/VCO2 slope) (P<0.05). In the predictive model constructed by selecting 7 key features using LASSO regression, the random forest model demonstrated the best overall performance across various metrics, with an area under the receiver operating curve of 0.930, an F1 score of 0.836, and a Brier score of 0.133 in the training set. It also exhibited good predictive ability and calibration in the test set. SHAP analysis ranked feature importance as follows: peak VO2, VE/VCO2 slope, age, FEV1, smoking history, diabetes, and surgical approach. Conclusion Integrating CPET parameters, the random forest model can effectively identify high-risk patients for PPC and has the potential for clinical application.