Application of machine learning models to survival risk stratification after radical surgery for thoracic squamous esophageal cancer
- VernacularTitle:机器学习模型在胸段食管鳞状细胞癌术后生存风险分层中的应用研究
- Author:
Jinye XU
1
,
2
,
3
;
Jianghui ZHOU
1
,
2
,
3
;
Shengwei LIU
4
;
Liangliang CHEN
1
,
2
,
3
;
Junxi HU
1
,
2
,
3
;
Xiaolin WANG
1
,
2
,
3
;
Yusheng SHU
1
,
2
,
3
Author Information
1. 1. Medical College of Yangzhou University, Yangzhou, 225000, Jiangsu, P. R. China
2. 2. Department of Thoracic Surgery, Northern Jiangsu People'
3. s Hospital, Clinical Medicine College of Yangzhou University, Yangzhou, 225000, Jiangsu, P. R. China
4. Department of Thoracic Surgery, The First Hospital Affiliated to Army Medical University, Chongqing, 400038, P. R. China
- Publication Type:Journal Article
- Keywords:
Esophageal neoplasms;
machine learning;
surgery;
prognosis;
prediction model;
survival risk stratification
- From:
Chinese Journal of Clinical Thoracic and Cardiovascular Surgery
2022;29(12):1574-1579
- CountryChina
- Language:Chinese
-
Abstract:
Objective To explore the application value of machine learning models in predicting postoperative survival of patients with thoracic squamous esophageal cancer. Methods The clinical data of 369 patients with thoracic esophageal squamous carcinoma who underwent radical esophageal cancer surgery at the Department of Thoracic Surgery of Northern Jiangsu People's Hospital from January 2014 to September 2015 were retrospectively analyzed. There were 279 (75.6%) males and 90 (24.4%) females aged 41-78 years. The patients were randomly divided into a training set (259 patients) and a test set (110 patients) with a ratio of 7 : 3. Variable screening was performed by selecting the best subset of
features. Six machine learning models were constructed on this basis and validated in an independent test set. The performance of the models' predictions was evaluated by area under the curve (AUC), accuracy and logarithmic loss, and the fit of the models was reflected by calibration curves. The best model was selected as the final model. Risk stratification was performed using X-tile, and survival analysis was performed using the Kaplan-Meier method with log-rank test. Results The 5-year postoperative survival rate of the patients was 67.5%. All clinicopathological characteristics of patients between the two groups in the training and test sets were not statistically different (P>0.05). A total of seven variables, including hypertension, history of smoking, history of alcohol consumption, degree of tissue differentiation, pN stage, vascular invasion and nerve invasion, were included for modelling. The AUC values for each model in the independent test set were: decision tree (AUC=0.796), support vector machine (AUC=0.829), random forest (AUC=0.831), logistic regression (AUC=0.838), gradient boosting machine (AUC=0.846), and XGBoost (AUC=0.853). The XGBoost model was finally selected as the best model, and risk stratification was performed on the training and test sets. Patients in the training and test sets were divided into a low risk group, an intermediate risk group and a high risk group, respectively. In both data sets, the differences in surgical prognosis among three groups were statistically significant (P<0.001). Conclusion Machine learning models have high value in predicting postoperative prognosis of thoracic squamous esophageal cancer. The XGBoost model outperforms common machine learning methods in predicting 5-year survival of patients with thoracic squamous esophageal cancer, and it has high utility and reliability.