Predicting axillary lymph node metastasis in invasive breast cancer using machine learning models based on serum biomarkers and other clinical features
10.12354/j.issn.1000-8179.2025.20250203
- VernacularTitle:基于血清标志物等临床特征的机器学习模型在浸润性乳腺癌腋窝淋巴结转移预测中的应用研究
- Author:
Yilihamu YIPALA
1
;
Wang LEI
;
Ma TAO
;
Gao CHUNJIE
;
Liu JING
;
Zhao TING
;
Wang YAN
Author Information
1. 新疆医科大学公共卫生学院(乌鲁木齐市 830017);新疆医科大学附属肿瘤医院肿瘤防治研究办公室
- Publication Type:Journal Article
- Keywords:
breast cancer;
axillary lymph node metastasis(ALNM);
machine learning(ML);
serum biomarkers;
SHAP values
- From:
Chinese Journal of Clinical Oncology
2025;52(10):507-514
- CountryChina
- Language:Chinese
-
Abstract:
Objective:Serum tumor markers(STMs)are important indicators associated with metastasis in patients with breast cancer(BC).This study focuses on predicting the risk of axillary lymph node metastasis(ALNM)in patients with invasive BC in Xinjiang by combining STMs and clinicopathological factors.Methods:Data from 3,360 patients diagnosed with invasive BC and treated at the Affiliated Cancer Hospital of Xinjiang Medical University between 2015 and 2019 were analyzed,focusing on 11 relevant demographic and clinical factors.Five ma-chine learning(ML)algorithms were used to develop predictive models for ALNM.Their performance was compared using metrics such as area under the curve(AUC),accuracy,Kappa value,and Brier score.The best-performing model was then compared with a nomogram based on Logistic regression(LR)to determine the final model.Shapley additive explanations(SHAP)values were used to rank the importance of factors contributing to ALNM.Results:Of the 3,266 patients studied,1,368(41.89%)developed ALNM.Among the five constructed ML models,eXtreme gradient boosting(XGBoost)demonstrated the best predictive performance with an AUC of 0.768,an accuracy of 0.735,and a Kappa value of 0.450.In both the training and validation sets,the XGBoost model outperformed the LR-based nomogram(training set AUC and Brier score:0.822(0.810~0.820)vs.0.742(0.721~0.763),0.170(0.163~0.177)vs.0.197(0.189~0.204);validation set AUC and Brier score:0.769(0.740~0.770)vs.0.747(0.716~0.779),0.190(0.178~0.202)vs.0.195(0.189~0.204)).Therefore,XGBoost was selec-ted as the final predictive model.SHAP analysis identified T stage,age,molecular subtype,and CEA level as the four most influential factors for ALNM prediction.Conclusions:The XGBoost model effectively predicts the risk of ALNM in patients with invasive BC based on STMs and clinicopathological features,outperforming traditional nomograms.SHAP analysis highlighted T stage as the most critical factor influencing ALNM.