Construction and Validation of Risk Prediction Model for Lung Adenocarcinoma based on TCGA Database
10.11783/j.issn.1002-3674.2025.02.007
- VernacularTitle:基于TCGA数据库肺腺癌的风险预测模型的构建与验证
- Author:
Mengyao GAO
1
;
Huaxia MU
;
Weixiao BU
Author Information
1. 山东第二医科大学公共卫生学院(261053)
- Publication Type:Journal Article
- Keywords:
Lung adenocarcinoma;
Differently expressed genes;
Elastic net;
Cox regression analysis;
Risk prediction model
- From:
Chinese Journal of Health Statistics
2025;42(2):191-196
- CountryChina
- Language:Chinese
-
Abstract:
Objective The purpose of this study is to screen the key genes and clinical characteristics related to the death or prognosis of lung adenocarcinoma(LUAD)patients based on the cancer genome atlas(TCGA)database,then construct and verify the effect of LUAD risk prediction model.Methods Clinical information and RNA sequencing data of lung adenocarcinoma patients were extracted from TCGA database.The deferentially expressed genes were screened,and hub genes were selected by protein interaction(PPI)network.70%of the data was used as the training set,and the entire data set was used as the validation set.In the training set,elastic net regression analysis was used to select prognostic genes and clinical characteristics,and Cox multivariate regression analysis was used to build a risk prediction model.The predictive performance of the model was evaluated by the area under the receiver's operating characteristic curve(AUC),C-index,and calibration curve.And the effect of the model was verified in the validation set.Results Elastic net regression analysis identified 23 factors associated with the survival status of LUAD patients.The variables that finally included in the predictive model include 6 genes(SEC61A1(P=0.004),MAP2K1(P=0.026),MMP1(P=0.001),SLC2A1(P=0.010),B4GALT1(P<0.001),ERO1A(P=0.024)),and M stage(P=0.003),N stage(P<0.001).In training set and test set,AUC was 0.764 and 0.710,C- index was 0.732 and 0.704,respectively.The tdROC curve and calibration curve showed that the predicted values of the model were highly consistent with the actual observed values.The Kaplan-Meier survival curve showed that the survival time of patients in low-risk group was statistically significantly longer than that of those in high-risk group(P<0.05).Conclusion Low expression of 2 genes(SEC61A1,MAP2K1),high expression of 4 genes(MMP1,SLC2A1,B4GALT1 and ERO1A),and distant metastasis of the primary tumors and the deepening of lymph node metastasis resulted in a significantly shorter survival time in LUAD patients.The prognosis analysis model based on elastic network has satisfactory predictive ability,which can provide scientific basis for prediction of the death risk of LUAD.