Evaluation of the predictive performance for distant metastasis in gastric adenocarcinoma based on machine learning and the SEER database
10.3969/j.issn.1671-8348.2025.11.028
- VernacularTitle:基于机器学习和SEER数据库的胃腺癌远处转移预测效果评价
- Author:
Chao ZHU
1
;
Ling LI
;
Min PU
Author Information
1. 南充市嘉陵区人民医院胃肠外科,四川 南充 637000
- Keywords:
gastric adenocarcinoma;
machine learning;
distant metastasis;
lymph nodes;
random forest
- From:
Chongqing Medicine
2025;54(11):2643-2648
- CountryChina
- Language:Chinese
-
Abstract:
Objective To evaluate the performance of models constructed based on demographic infor-mation and tumor characteristics for predicting distant metastasis in gastric adenocarcinoma patients.Methods Data of 7 788 gastric adenocarcinoma patients from the Surveillance,Epidemiology,and End Results(SEER)database(2001-2021)were collected as the training set,and clinical data of 259 gastric adenocarci-noma patients from Nanchong Jialing District People's Hospital(January 2019 to October 2024)were collect-ed as the test set.Four machine learning algorithms were used to construct a risk prediction model for distant metastasis of gastric adenocarcinoma,and multivariate logistic regression was used to analyze the influencing factors of distant metastasis in patients with gastric adenocarcinoma.Results Distant metastasis was present in 801 cases(10.3%)in the training set and 25 cases(9.7%)in the test set.Multivariate logistic regression analysis showed that high differentiation,advanced T stage,and higher log odds of positive lymph nodes(LODDS)were risk factors for distant metastasis(P<0.05),while older age,primary tumor site at the gas-troesophageal junction,and higher total number of lymph nodes examined were protective factors(P<0.05).Validation on the test set using the four models showed that the predictive ability of the random forest model was superior to that of logistic regression,K-nearest neighbors(KNN),and support vector machine(SVM).The predictive ability of SVM was better than that of logistic regression and KNN,while logistic regression and KNN showed comparable performance.Delong's test showed that the predictive performance of the ran-dom forest model was superior to that of logistic regression,KNN,and SVM.Calibration curve analysis showed that the predictive accuracy of the random forest model was superior to that of logistic regression,KNN,and SVM.The top five important factors in the random forest model were negative lymph node count,tumor size,metastatic lymph node count,T stage and age.Conclusion The risk prediction model for distant metastasis in gastric adenocarcinoma constructed using the random forest algorithm demonstrats optimal per-formance.