1.Analysis on influencing factors for occurrence of angina pectoris in diabetic mellitus patients and its Bayesian network risk prediction
Shuang LI ; Jiayu GE ; Xianzhu CONG ; Aimin WANG ; Yujia KONG ; Fuyan SHI ; Suzhen WANG
Journal of Jilin University(Medicine Edition) 2025;51(4):1028-1038
Objective:To discuss the influencing factors of angina pectoris in the patients with diabetes mellitus(DM),to construct a Bayesian network model to explore the network relationships among the influencing factors,and to predict the risk of angina pectoris in the patients with DM.Methods:Based on the UK Biobank(UKB)database,the Logistic regression aralysis model was used to screen the influencing factors of angina pectoris in the patients with DM.The taboo search algorithm was used for structure learning,and the Bayesian parameter estimation method was used for parameter learning to construct the Bayesian network model.Results:A total of 22 712 DM patients were included.The influencing factors of angina pectoris in the patients with DM included 14 variables:gender,age,body mass index(BMI),triglycerides(TG),total cholesterol(TC),glycated hemoglobin(HbA1c),hypertension,maternal smoking around delivery,smoking status,alcohol consumption,regular exercise,insomnia,sleep duration,and childhood relative body size(P<0.05).A Bayesian network model was constructed with 15 nodes and 22 directed edges.Among them,age,HbA1c,hypertension,regular exercise,BMI,and sleep duration were directly associated with the occurrence of angina pectoris in the patients with DM,while gender,smoking status,alcohol consumption,TC,TG,insomnia,childhood relative body size,and maternal smoking around delivery were indirectly associated with the occurrence of angina pectoris in the patients with DM.Conclusion:Age,HbA1c,hypertension,regular exercise,BMI,and sleep duration are direct influencing factors of angina pectoris in the patients with DM.Controlling HbA1c,blood pressure,and BMI levels,engaging in regular exercise,and maintaining appropriate sleep duration are beneficial for reducing the risk of angina pectoris in the patients with DM.
2.Construction of diagnostic model for Alzheimer's disease and immune analysis based on bioinformatics and machine learning
Linrui XU ; Yiyu ZHANG ; Jiaqi CUI ; Xianzhu CONG ; Shuang LI ; Jiayu GE ; Yujia KONG ; Suzhen WANG ; Fuyan SHI ; Jinrong WANG
Journal of Jilin University(Medicine Edition) 2025;51(4):1039-1051
Objective:To screen the Alzheimer's disease(AD)-related genes and construct its diagnostic model using bioinformatics technology and machine learning(ML)algorithms,to discuss the immunological characteristics of AD patients,and to provide novel biomarkers for AD diagnosis.Methods:The AD-related gene expression dataset GSE125583 was downloaded from the Gene Expression Omnibus(GEO)database.Differentially expressed genes(DEGs)were identified through differential analysis.Gene Ontology(GO)functional enrichment and Kyoto Encyclopedia of Genes and Genomes(KEGG)signaling pathway enrichment analyses were performed to explore the biological functions and signaling pathways of DEGs.A protein-protein interaction(PPI)network was constructed,and hub genes were screened using Cytoscape software combined with three ML algorithms:Least Absolute Shrinkage and Selection Operator(LASSO),eXtreme Gradient Boosting(XGBoost),and Random Forest(RF).The screened hub genes were utilized to build an AD diagnostic model via RF,followed by feature importance ranking.The model's efficacy and key genes were evaluated using a test set.Single-sample gene set enrichment analysis(ssGSEA)was used for immune cell infiltration analysis between AD group and control group.Results:Differential analysis identified 1 287 DEGs.The GO functional enrichment analysis results revealed that DEGs were primarily involved in biological functions related to neural signaling,synapses,and vesicles.KEGG signaling pathway enrichment analysis indicated significant enrichment of DEGs in ion transport,neurotransmitter,and ligand-gated channel pathways.Nine overlapping hub genes were screened by the three ML algorithms.In the AD diagnostic model,the top four key genes with highest diagnostic performance were adenylate cyclase-activating polypeptide 1(ADCYAP1),brain-derived neurotrophic factor(BDNF),platelet-derived growth factor receptor β(PDGFRB),and C-X-C motif chemokine receptor 4(CXCR4),with corresponding area under the curve(AUC)values of 0.852,0.795,0.820,and 0.756,respectively.The model achieved an AUC of 0.828,accuracy of 81.25%,sensitivity of 84.40%,and specificity of 71.43%.The immune cell infiltration analysis results demonstrated higher infiltration of macrophages,monocytes,natural killer(NK)cells,and lymphocytes in AD tissue.Among these,NK/natural killer T(NKT)cells and plasmacytoid dendritic cells showed significant correlations with the four key genes(P<0.05).Conclusion:The feature genes screened based on bioinformatics and ML exhibit diagnostic potential for AD.Genes such as ADCYAP1 may serve as potential biomarkers for AD diagnosis,offering significant implications for early prevention and treatment.
3.CatBoost algorithm and Bayesian network model analysis based on risk prediction of cardiovascular and cerebro vascular diseases
Aimin WANG ; Fenglin WANG ; Yiming HUANG ; Yaqi XU ; Wenjing ZHANG ; Xianzhu CONG ; Weiqiang SU ; Suzhen WANG ; Mengyao GAO ; Shuang LI ; Yujia KONG ; Fuyan SHI ; Enxue TAO
Journal of Jilin University(Medicine Edition) 2024;50(4):1044-1054
Objective:To screen the main characteristic variables affecting the incidence of cardiovascular and cerebrovascular diseases,and to construct the Bayesian network model of cardiovascular and cerebrovascular disease incidence risk based on the top 10 characteristic variables,and to provide the reference for predicting the risk of cardiovascular and cerebrovascular disease incidence.Methods:From the UK Biobank Database,315 896 participants and related variables were included.The feature selection was performed by categorical boosting(CatBoost)algorithm,and the participants were randomly divided into training set and test set in the ratio of 7∶3.A Bayesian network model was constructed based on the max-min hill-climbing(MMHC)algorithm.Results:The prevalence of cardiovascular and cerebrovascular diseases in this study was 28.8%.The top 10 variables selected by the CatBoost algorithm were age,body mass index(BMI),low-density lipoprotein cholesterol(LDL-C),total cholesterol(TC),the triglyceride-glucose(TyG)index,family history,apolipoprotein A/B ratio,high-density lipoprotein cholesterol(HDL-C),smoking status,and gender.The area under the receiver operating characteristic(ROC)curve(AUC)for the CatBoost training set model was 0.770,and the model accuracy was 0.764;the AUC of validation set model was 0.759 and the model accuracy was 0.763.The clinical efficacy analysis results showed that the threshold range for the training set was 0.06-0.85 and the threshold range for the validation set was 0.09-0.81.The Bayesian network model analysis results indicated that age,gender,smoking status,family history,BMI,and apolipoprotein A/B ratio were directly related to the incidence of cardiovascular and cerebrovascular diseases and they were the significant risk factors.TyG index,HDL-C,LDL-C,and TC indirectly affect the risk of cardiovascular and cerebrovascular diseases through their impact on BMI and apolipoprotein A/B ratio.Conclusion:Controlling BMI,apolipoprotein A/B ratio,and smoking behavior can reduce the incidence risk of cardiovascular and cerebrovascular diseases.The Bayesian network model can be used to predict the risk of cardiovascular and cerebrovascular disease incidence.

Result Analysis
Print
Save
E-mail