Single-Cell and Machine Learning-Based Identification of Epithelial Subsets and Prognostic Modeling in Triple-Negative Breast Cancer
10.3971/j.issn.1000-8578.2026.25.0694
- VernacularTitle:基于单细胞与机器学习的三阴性乳腺癌上皮细胞亚群解析与预后模型构建
- Author:
Jinpeng WU
1
;
Xue GUO
2
;
Engu LIU
3
;
Feng LIN
1
;
Hongtao LI
1
Author Information
1. Department of Breast and Thyroid Surgery, Affiliated Tumor Hospital of Xinjiang Medical University, Urumqi 830000, China.
2. Department of Breast and Thyroid Surgery, Affiliated Tumor Hospital of Xinjiang Medical University, Urumqi 830000, China;School of Clinical Medicine, Xinjiang Medical University, Urumqi 830017, China.
3. School of Clinical Medicine, Xinjiang Medical University, Urumqi 830017, China.
- Publication Type:BASICRESEARCH
- Keywords:
Triple-negative breast cancer;
Epithelial cell;
Machine learning;
Prognostic model;
scRNA-seq;
hdWGCNA
- From:
Cancer Research on Prevention and Treatment
2026;53(4):251-266
- CountryChina
- Language:Chinese
-
Abstract:
Objective To investigate the heterogeneity and key molecular features of epithelial cells in triple-negative breast cancer (TNBC), identify prognostic biomarkers, and develop a robust survival prediction model. Methods Using TNBC single-cell transcriptomic data, epithelial cells were extracted, normalized, and subclustered to characterize their molecular signatures and functional differences. High-dimensional weighted gene co-expression network analysis (hdWGCNA) was applied to establish co-expression modules in epithelial cells. Multiple machine learning algorithms were integrated to select key prognostic genes and develop a risk-score model, whose performance was evaluated using receiver operating characteristic (ROC) curves and Kaplan-Meier (K-M) survival analysis. In addition, the immune microenvironment features and potential drug-response differences between the high- and low-risk groups were systematically assessed. Finally, PCR was performed to validate the expression differences of the key genes between tumor and normal tissues. Results We characterized the composition and molecular features of TNBC epithelial subpopulations and identified a TNBC-associated epithelial subset. By integrating hdWGCNA with machine learning approaches, 10 key genes were selected to construct a prognostic model, which effectively stratified patients into distinct survival-risk groups and demonstrated favorable predictive performance in ROC and K-M analyses. Immune profiling revealed the differences in the infiltration levels of seven immune cell types and immune function-related features between the high- and low-risk groups. Drug-sensitivity analysis suggested potential differential responses to eight agents across the risk groups. PCR validation further confirmed the differential expression of the ten signature genes between tumor and normal tissues. Conclusion This study reveals epithelial heterogeneity in TNBC at single-cell resolution and establishes a 10-gene prognostic model, which may facilitate the stratification of TNBC risk and the evaluation of immune characteristics and potential therapeutic strategies.