Weighted gene co-expression network analysis and machine learning identification of key genes in rheumatoid arthritis synovium
- VernacularTitle:加权共表达网络分析与机器学习识别类风湿关节炎滑膜中的关键基因
- Author:
Yingkai WU
1
,
2
;
Gaolong SHI
;
Zonggang XIE
Author Information
- Keywords: weighted gene co-expression network; machine learning algorithm; rheumatoid arthritis; key gene; prediction model
- From: Chinese Journal of Tissue Engineering Research 2025;29(2):294-301
- CountryChina
- Language:Chinese
- Abstract: BACKGROUND:Rheumatoid arthritis is a condition that affects the entire immune system in the body and is known for causing inflammatory hyperplasia in the joints and destruction of articular cartilage.The pathogenesis of rheumatoid arthritis is still unclear;therefore,there is an urgent need to discover new highly sensitive and specific diagnostic biomarkers. OBJECTIVE:To identify and screen key genes in the synovium of rheumatoid arthritis patients using bioinformatics techniques and machine learning algorithms and to construct and validate a rheumatoid arthritis prediction model. METHODS:Three datasets containing synovial tissue samples from rheumatoid arthritis patients(GSE77298,GSE55235,GSE55457)were downloaded from the Gene Expression Omnibus(GEO)database.GSE77298 and GSE55235 were used as the training set,while GSE55457 served as the test set,with a total of 66 samples,including 39 samples from rheumatoid arthritis patients and 27 normal synovial samples.Differentially expressed genes in the training set were selected using R language,and then the weighted gene co-expression network analysis was used to modularize the genes in the training set.The most relevant module was selected,and feature genes within this module were identified.Differentially expressed genes and the feature genes from the module were intersected for the subsequent machine learning analysis.Three machine learning methods,namely the least absolute shrinkage and selection operator algorithm,support vector machine with recursive feature elimination,and random forest algorithm,were employed to further analyze the intersected genes and identify the hub genes.The hub genes obtained from these three machine learning algorithms were intersected again to obtain the key genes in the synovium of rheumatoid arthritis.A predictive rheumatoid arthritis model was constructed using these key genes as variables,and the risk of developing rheumatoid arthritis in patients was inferred based on the model.The receiver operating characteristic curve was used to determine the diagnostic value of the rheumatoid arthritis prediction model and its key genes. RESULTS AND CONCLUSION:Through the differential analysis,a total of 730 differentially expressed genes were identified in the training set,and 185 feature genes were identified in the weighted gene co-expression network analysis feature modules.There were 159 intersected genes obtained.There were 4 hub genes identified by the least absolute shrinkage and selection operator algorithm,11 hub genes by the support vector machine with recursive feature elimination algorithm,and 5 hub genes by the random forest algorithm.After intersection,2 key genes(TNS3 and SDC1)were obtained.Based on the two key genes,a nomogram model was constructed in the training and test sets,with good fit between the calibration prediction curve and the standard curve,and good clinical efficacy in predicting the onset of rheumatoid arthritis.These findings indicate that TNS3 and SDC1,obtained based on bioinformatics and machine learning algorithms,may become key targets for the diagnosis and treatment of rheumatoid arthritis.