Application of Random Survival Forest in Prognosis Analysis of Genetic Data in Patients with Colorectal Cancer
10.11783/j.issn.1002-3674.2024.04.011
- VernacularTitle:随机生存森林在结直肠癌患者基因数据预后分析中的应用研究
- Author:
Huaxia MU
1
;
Weixiao BU
;
Mengyao GAO
Author Information
1. 山东第二医科大学公共卫生学院(261053)
- Keywords:
Random survival forest;
Lasso-Cox regression;
Colorectal cancer;
Genetic data;
Prognostic analysis
- From:
Chinese Journal of Health Statistics
2024;41(4):532-538
- CountryChina
- Language:Chinese
-
Abstract:
Objective To explore the prognostic factors of colorectal cancer patients in gene data using random survival forest model.Method The differentially expressed genes were screened using the gene expression data of colorectal cancer in TCGA database,and combined with clinical and survival information.The RSF model is constructed and compared with the traditional Lasso-Cox regression model.Results The RSF model obtained 13 important factors affecting the prognosis of colorectal cancer patients,including HAND1(VIMP=0.090)and PCOLCE2(VIMP=0.075)genes,and analyzed the interaction between pathological N,PCOLCE2 gene and IGSF9 gene variables.Compared with Lasso-Cox model,the RSF model has better model calibration(IBS:training set:0.205 vs.0.214;test set:0.210 vs.0.221)although its prediction error rate is slightly higher(1-C-index:training set:0.296 vs.0.213;test set:0.369 vs.0.332).Conclusion RSF model has a good performance in processing the analysis of right censored survival data,can find important influencing factors and the interaction between variables,and provide scientific basis for the improvement of prognosis and quality of life of colorectal cancer patients.