1.A prognostic model for multiple myeloma based on lipid metabolism related genes.
Zhengjiang LI ; Liang ZHAO ; Fangming SHI ; Jiaojiao GUO ; Wen ZHOU
Journal of Central South University(Medical Sciences) 2025;50(4):517-530
OBJECTIVES:
Multiple myeloma (MM) is a highly heterogeneous hematologic malignancy, with disease progression driven by cytogenetic abnormalities and a complex bone marrow microenvironment. This study aims to construct a prognostic model for MM based on transcriptomic data and lipid metabolism related genes (LRGs), and to identify potential drug targets for high-risk patients to support clinical decision-making.
METHODS:
In this study, 2 transcriptomic datasets covering 985 newly diagnosed MM patients were retrieved from the Gene Expression Omnibus (GEO) database. Univariate Cox regression and 101 machine learning algorithms were used for gene selection. An LRG-based prognostic model was constructed using Stepwise Cox (both directions) and random survival forest (RSF) algorithms. The association between the prognostic score and clinical events was evaluated, and model performance was assessed using time-dependent receiver operating characteristic (ROC) curves and the C-index. The added predictive value of combining prognostic scores with clinical variables and staging systems was also analyzed. Differentially expressed genes between high- and low-risk groups were identified using limma and clusterProfiler and subjected to pathway enrichment analysis. Drug sensitivity analysis was conducted using the Genomics of Drug Sensitivity in Cancer (GDSC) database and oncoPredict to identify potential therapeutic targets for high-risk patients. The functional role of key LRGs in the model was validated via in vitro cell experiments.
RESULTS:
An LRG-based prognostic model (LRG17) was successfully developed using transcriptomic data and machine learning. The model demonstrated robust predictive performance, with area under the curve (AUC) values of 0.962, 0.912, and 0.842 for 3-, 5-, and 7-year survival, respectively. Patients were stratified into high- and low-risk groups, with high-risk patients showing significantly shorter overall survival (OS) and event-free survival (EFS) (both P<0.001) and worse clinical profiles (e.g., lower albumin, higher β2-microglobulin and lactate dehydrogenase levels). Enrichment analysis revealed that high-risk patients were significantly enriched for pathways related to chromosome segregation and mitosis, whereas low-risk patients were enriched for immune response and immune cell activation pathways. Drug screening suggested that AURKA inhibitor BMS-754807 and FGFR3 inhibitor I-BET-762 may be more effective in high-risk patients. Functional assays demonstrated that silencing of key LRG PLA2G4A significantly inhibited cell viability and induced apoptosis.
CONCLUSIONS
LRGs serve as promising biomarkers for prognosis prediction and risk stratification in MM. The overexpression of chromosomal instability-related and high-risk genetic event-associated genes in high-risk patients may explain their poorer outcomes. Given the observed resistance to bortezomib and lenalidomide in high-risk patients, combination therapies involving BMS-754807 or I-BET-762 may represent effective alternatives.
Humans
;
Multiple Myeloma/mortality*
;
Prognosis
;
Lipid Metabolism/genetics*
;
Transcriptome
;
Machine Learning
;
Male
;
Female
;
Gene Expression Profiling
;
Algorithms
2.Construction of a treatment response prediction model for multiple myeloma based on multi-omics and machine learning.
Xionghui ZHOU ; Rong GUI ; Jing LIU ; Meng GAO
Journal of Central South University(Medical Sciences) 2025;50(4):531-544
OBJECTIVES:
Multiple myeloma (MM) is a hematologic malignancy characterized by clonal proliferation of plasma cells and remains incurable. Patients with primary refractory multiple myeloma (PRMM) show poor response to initial induction therapy. This study aims to develop a machine learning-based model to predict treatment response in newly diagnosed multiple myeloma (NDMM) patients, in order to optimize therapeutic strategies.
METHODS:
NDMM and post-treatment MM patients hospitalized in the Department of Hematology, Third Xiangya Hospital, Central South University, between August 2022 and July 2023 were enrolled. Post-treatment MM patients were categorized into PRMM patients and treatment-responsive MM (TRMM) patients based on therapeutic efficacy. Serum metabolites were detected and analyzed via metabolomics. Based on the metabolomics analysis results and combined with transcriptomic sequencing data of NDMM patients from databases, differentially expressed amino acid metabolism-related genes (AAMGs) among post-treatment NDMM patients with varying therapeutic outcomes were screened. Using bioinformatics analyses and machine learning algorithms, a predictive model for treatment response in NDMM was constructed and used to identify patients at risk for PRMM.
RESULTS:
A total of 61 patients were included: 22 NDMM, 23 TRMM, and 16 PRMM patients. Significant differences in metabolite levels were observed among the 3 groups, with differential metabolites mainly enriched in amino acid metabolism pathways. Follow-up data were available for 16 of the 22 NDMM patients, including 12 treatment responders (ND_TR group) and 4 with PRMM (ND_PR group). A total of 23 differential metabolites were identified between these 2 groups: 6 metabolites (e.g., tryptophan) were upregulated and 17 (e.g., citric acid) were downregulated in the ND_TR group. Transcriptomic data from 108 TRMM and 77 PRMM patients were analyzed to identify differentially expressed AAMGs, which were then used to construct a prediction model. The area under the receiver operating characteristic curve (AUC) for the model exceeded 0.8, and AUC values in 3 external validation cohorts were all above 0.7.
CONCLUSIONS
This study delineated the metabolic alterations in MM patients with different treatment response, suggesting that dysregulated amino acid metabolism may be associated with poor treatment response in PRMM. By integrating metabolomics and transcriptomics, a machine learning-based predictive model was successfully established to forecast treatment response in NDMM patients.
Humans
;
Multiple Myeloma/drug therapy*
;
Machine Learning
;
Male
;
Female
;
Metabolomics/methods*
;
Middle Aged
;
Aged
;
Treatment Outcome
;
Transcriptome
;
Computational Biology
;
Adult
;
Multiomics
3.Radiogenomics-based prediction of KRAS and EGFR gene mutation in non-small cell lung cancer patients.
Jianing LIN ; Zhihang YAN ; Longyu HE ; Hao ZHANG ; Mingxuan XIE
Journal of Central South University(Medical Sciences) 2025;50(5):805-814
OBJECTIVES:
Non-small cell lung cancer (NSCLC) is associated with poor prognosis, with 30% of patients diagnosed at an advanced stage. Mutations in the EGFR and KRAS genes are important prognostic factors for NSCLC, and targeted therapies can significantly improve survival in these patients. Although tissue biopsy remains the gold standard for detecting gene mutations, it has limitations, including invasiveness, sampling errors due to tumor heterogeneity, and poor reproducibility. This study aims to develop machine learning models based on radiomic features to predict EGFR and KRAS gene mutation status in NSCLC patients, thereby providing a reference for precision oncology.
METHODS:
Imaging and mutation data from eligible NSCLC patients were obtained from the publicly available Lung-PET-CT-Dx dataset in The Cancer Imaging Archive (TCIA). A three-dimensional-convolutional neural network (3D-CNN) was used to extract imaging features from the regions of interest (ROI). The LightGBM algorithm was employed to build classification models for predicting EGFR and KRAS gene mutation status. Model performance was evaluated using 5-fold cross-validation, with receiver operator characteristic (ROC) curves, area under the curve (AUC), accuracy, sensitivity, and specificity used for validation.
RESULTS:
The models effectively predicted EGFR and KRAS mutations in NSCLC patients, achieving an AUC of 0.95 for EGFR mutations and 0.90 for KRAS. The models also demonstrated high accuracy (EGFR 89.66%; KRAS 87.10%), sensitivity (EGFR 93.33%; KRAS 87.50%), and specificity (EGFR 85.71%; KRAS 86.67%).
CONCLUSIONS
A radiogenomics-machine learning predictive model can serve as a non-invasive tool for anticipating EGFR and KRAS gene mutation status in NSCLC patients.
Humans
;
Carcinoma, Non-Small-Cell Lung/diagnostic imaging*
;
Lung Neoplasms/diagnostic imaging*
;
Mutation
;
Proto-Oncogene Proteins p21(ras)/genetics*
;
ErbB Receptors/genetics*
;
Machine Learning
;
Positron Emission Tomography Computed Tomography
;
Female
;
Male
;
Neural Networks, Computer
;
Middle Aged
;
Aged
4.Nomogram and machine learning models for predicting in-hospital mortality in sepsis patients with deep vein thrombosis.
Hongwei DUAN ; Huaizheng LIU ; Chuanzheng SUN ; Jing QI
Journal of Central South University(Medical Sciences) 2025;50(6):1013-1029
OBJECTIVES:
Global epidemiological data indicate that 20% to 30% of intensive care unit (ICU) sepsis patients progress to deep vein thrombosis (DVT) due to coagulopathy, with an associated mortality rate of 25% to 40%. Existing prognostic tools have limitations. This study aims to develop and validate nomogram and machine learning models to predict in-hospital mortality in sepsis patients with DVT and assess their clinical applicability.
METHODS:
This multicenter retrospective study drew on data from the Medical Information Mart for Intensive Care IV (MIMIC-IV; n=2 235), the eICU Collaborative Research Database (eICU-CRD; n=1 274), and the Patient Admission Dataset from the ICU of Third Xiangya Hospital, Central South University (CSU-XYS-ICU; n=107). MIMIC-IV was split into a training set (n=1 584) and internal validation set (n=651), with the remaining datasets used for external validation. Predictors were selected via least absolute shrinkage and selection operator (LASSO) regression and Bayesian Information Criterion (BIC), and a nomogram model was constructed. An extreme gradient boosting (XGBoost) algorithm was used to build the machine learning model. Model performance was assessed by the concordance index (C-index), calibration curves, Brier score, decision curve analysis (DCA), and net reclassification improvement index (NRI).
RESULTS:
Five key predictors, age [odds ratio (OR)=1.02, 95% CI 1.01 to 1.03, P<0.001], minimum activated partial thromboplastin (APTT; OR=1.09, 95% CI 1.08 to 1.11, P<0.001), maximum APTT (OR=1.01, 95% CI 1.00 to 1.01, P<0.001), maximum lactate (OR=1.56, 95% CI 1.39 to 1.75, P<0.001), and maximum serum creatinine (OR=2.03, 95% CI 1.79 to 2.30, P<0.001), were included in the nomogram. The model showed robust performance in internal validation (C-index=0.845, 95% CI 0.811 to 0.879) and external validation (eICU-CRD: C-index=0.827, 95% CI 0.800 to 0.854; CSU-XYS-ICU: C-index=0.779, 95% CI 0.687 to 0.871). Calibration curves indicated good agreement between predicted and observed outcomes (Brier score<0.25), and DCA confirmed clinical benefit. The XGBoost model achieved an area under the receiver operating characteristic curve (AUC) of 0.982 (95% CI 0.969 to 0.985) in the training set, but performance declined in external validation (eICU-CRD, AUC=0.825, 95% CI 0.817 to 0.861; CSU-XYS-ICU, AUC=0.766, 95% CI 0.700 to 0.873), though it remained above clinical thresholds. Net reclassification improvement was slightly lower for XGBoost compared with the nomogram (NRI=0.58).
CONCLUSIONS
Both the nomogram and XGBoost models effectively predict in-hospital mortality in sepsis patients with DVT. However, the nomogram offers superior generalizability and clinical usability. Its visual scoring system provides a quantitative tool for identifying high-risk patients and implementing individualized interventions.
Humans
;
Sepsis/complications*
;
Machine Learning
;
Nomograms
;
Venous Thrombosis/complications*
;
Retrospective Studies
;
Hospital Mortality
;
Male
;
Female
;
Middle Aged
;
Aged
;
Intensive Care Units
;
Prognosis
;
Bayes Theorem
5.Mediating role of insulin resistance in the relationship between hypertension and NAFLD and construction of its risk prediction model.
Yaxuan HE ; Honghui HE ; Yu CAO ; Fang WANG
Journal of Central South University(Medical Sciences) 2025;50(7):1188-1201
OBJECTIVES:
Non-alcoholic fatty liver disease (NAFLD) and hypertension are common metabolic disorders, both closely associated with insulin resistance (IR), suggesting potential shared pathological mechanisms. This study aims to investigate the mediating role of IR in the relationship between hypertension and NAFLD, and to evaluate the applicability and modeling value of various IR surrogate indices in predicting NAFLD risk.
METHODS:
A total of 280 976 individuals who underwent health examinations at the Health Management Center of the Third Xiangya Hospital of Central South University between August 2017 and December 2021 were included. NAFLD was diagnosed based on abdominal ultrasound findings, and hypertension was defined according to the criteria of the Chinese Guidelines for the Management of Hypertension. Demographic information, anthropometric indices, and biochemical parameters were collected, and multiple IR surrogate indices were constructed, including the triglyceride-glucose index (TyG) and its derivatives, as well as the metabolic score for insulin resistance (METS-IR). Group comparisons were performed between hypertensive and non-hypertensive participants, as well as between NAFLD and non-NAFLD participants. Pearson correlation analysis was applied to assess the associations of metabolic parameters and IR indices with NAFLD. Furthermore, mediation models were constructed to explore the mediating role of IR in the "hypertension-NAFLD" relationship. Finally, parametric models and machine learning algorithms were compared to evaluate their predictive performance and value in assessing NAFLD risk in this population.
RESULTS:
The prevalence of NAFLD was significantly higher in hypertensive individuals than in non-hypertensive participants (63.61% vs 33.79%, P<0.001), accompanied by elevated IR levels and adverse metabolic features. Correlation analysis and variable importance rankings across multiple models consistently identified TyG-waist circumference (TyG-WC) and METS-IR as the IR indices most strongly associated with NAFLD. In mediation analysis, the TyG-WC pathway explained 32.03% of the total effect, and the METS-IR pathway explained 17.02%. Interaction analysis showed that hypertension status may attenuate the mediating effect of IR (all interaction estimates were negative). In prediction model comparisons, the simplified model incorporating sex, age, WC, TyG-WC, and METS-IR demonstrated good performance in the test set. Logistic regression and its regularized form (LASSO regression) achieved an accuracy of 0.83, receiver operating characteristic (ROC)-area under the curve (AUC) of 0.91, and a Brier score of 0.12, comparable to ensemble models (random forest and XGBoost), with consistently stable performance across different algorithms.
CONCLUSIONS
IR plays a significant mediating role in the association between hypertension and NAFLD, with TyG-WC identified as a key indicator showing strong mechanistic relevance and predictive value. Risk prediction models based on IR surrogate indices demonstrate advantages in simplicity and interpretability, providing empirical support for the early screening and individualized prevention of NAFLD in the general population.
Humans
;
Non-alcoholic Fatty Liver Disease/complications*
;
Insulin Resistance
;
Hypertension/epidemiology*
;
Male
;
Female
;
Middle Aged
;
Risk Factors
;
Adult
;
Machine Learning
;
Triglycerides/blood*
6.Artificial intelligence in natural products research.
Xiao YUAN ; Xiaobo YANG ; Qiyuan PAN ; Cheng LUO ; Xin LUAN ; Hao ZHANG
Chinese Journal of Natural Medicines (English Ed.) 2025;23(11):1342-1357
Artificial intelligence (AI) has emerged as a transformative technology in accelerating drug discovery and development within natural medicines research. Natural medicines, characterized by their complex chemical compositions and multifaceted pharmacological mechanisms, demonstrate widespread application in treating diverse diseases. However, research and development face significant challenges, including component complexity, extraction difficulties, and efficacy validation. AI technology, particularly through deep learning (DL) and machine learning (ML) approaches, enables efficient analysis of extensive datasets, facilitating drug screening, component analysis, and pharmacological mechanism elucidation. The implementation of AI technology demonstrates considerable potential in virtual screening, compound optimization, and synthetic pathway design, thereby enhancing natural medicines' bioavailability and safety profiles. Nevertheless, current applications encounter limitations regarding data quality, model interpretability, and ethical considerations. As AI technologies continue to evolve, natural medicines research and development will achieve greater efficiency and precision, advancing both personalized medicine and contemporary drug development approaches.
Biological Products/pharmacology*
;
Artificial Intelligence
;
Humans
;
Drug Discovery/methods*
;
Machine Learning
;
Deep Learning
7.Identification of natural product-based drug combination (NPDC) using artificial intelligence.
Tianle NIU ; Yimiao ZHU ; Minjie MOU ; Tingting FU ; Hao YANG ; Huaicheng SUN ; Yuxuan LIU ; Feng ZHU ; Yang ZHANG ; Yanxing LIU
Chinese Journal of Natural Medicines (English Ed.) 2025;23(11):1377-1390
Natural product-based drug combinations (NPDCs) present distinctive advantages in treating complex diseases. While high-throughput screening (HTS) and conventional computational methods have partially accelerated synergistic drug combination discovery, their applications remain constrained by experimental data fragmentation, high costs, and extensive combinatorial space. Recent developments in artificial intelligence (AI), encompassing traditional machine learning and deep learning algorithms, have been extensively applied in NPDC identification. Through the integration of multi-source heterogeneous data and autonomous feature extraction, prediction accuracy has markedly improved, offering a robust technical approach for novel NPDC discovery. This review comprehensively examines recent advances in AI-driven NPDC prediction, presents relevant data resources and algorithmic frameworks, and evaluates current limitations and future prospects. AI methodologies are anticipated to substantially expedite NPDC discovery and inform experimental validation.
Artificial Intelligence
;
Biological Products/chemistry*
;
Humans
;
Drug Combinations
;
Drug Discovery/methods*
;
Machine Learning
;
Algorithms
8.Advances in small molecule representations and AI-driven drug research: bridging the gap between theory and application.
Junxi LIU ; Shan CHANG ; Qingtian DENG ; Yulian DING ; Yi PAN
Chinese Journal of Natural Medicines (English Ed.) 2025;23(11):1391-1408
Artificial intelligence (AI) researchers and cheminformatics specialists strive to identify effective drug precursors while optimizing costs and accelerating development processes. Digital molecular representation plays a crucial role in achieving this objective by making molecules machine-readable, thereby enhancing the accuracy of molecular prediction tasks and facilitating evidence-based decision making. This study presents a comprehensive review of small molecular representations and AI-driven drug discovery downstream tasks utilizing these representations. The research methodology begins with the compilation of small molecule databases, followed by an analysis of fundamental molecular representations and the models that learn these representations from initial forms, capturing patterns and salient features across extensive chemical spaces. The study then examines various drug discovery downstream tasks, including drug-target interaction (DTI) prediction, drug-target affinity (DTA) prediction, drug property (DP) prediction, and drug generation, all based on learned representations. The analysis concludes by highlighting challenges and opportunities associated with machine learning (ML) methods for molecular representation and improving downstream task performance. Additionally, the representation of small molecules and AI-based downstream tasks demonstrates significant potential in identifying traditional Chinese medicine (TCM) medicinal substances and facilitating TCM target discovery.
Artificial Intelligence
;
Drug Discovery/methods*
;
Humans
;
Machine Learning
;
Medicine, Chinese Traditional
;
Small Molecule Libraries/chemistry*
9.A multi-constraint representation learning model for identification of ovarian cancer with missing laboratory indicators.
Zihan LU ; Fangjun HUANG ; Guangyao CAI ; Jihong LIU ; Xin ZHEN
Journal of Southern Medical University 2025;45(1):170-178
OBJECTIVES:
To evaluate the performance of a multi-constraint representation learning classification model for identifying ovarian cancer with missing laboratory indicators.
METHODS:
Tabular data with missing laboratory indicators were collected from 393 patients with ovarian cancer and 1951 control patients. The missing ovarian cancer laboratory indicator features were projected to the latent space to obtain a classification model using the representational learning classification model based on discriminative learning and mutual information coupled with feature projection significance score consistency and missing location estimation. The proposed constraint term was ablated experimentally to assess the feasibility and validity of the constraint term by accuracy, area under the ROC curve (AUC), sensitivity, and specificity. Cross-validation methods and accuracy, AUC, sensitivity and specificity were also used to evaluate the discriminative performance of this classification model in comparison with other interpolation methods for processing of the missing data.
RESULTS:
The results of the ablation experiments showed good compatibility among the constraints, and each constraint had good robustness. The cross-validation experiment showed that for identification of ovarian cancer with missing laboratory indicators, the AUC, accuracy, sensitivity and specificity of the proposed multi-constraints representation-based learning classification model was 0.915, 0.888, 0.774, and 0.910, respectively, and its AUC and sensitivity were superior to those of other interpolation methods.
CONCLUSIONS
The proposed model has excellent discriminatory ability with better performance than other missing data interpolation methods for identification of ovarian cancer with missing laboratory indicators.
Female
;
Humans
;
Ovarian Neoplasms/diagnosis*
;
Machine Learning
;
ROC Curve
10.Construction of recognition models for subthreshold depression based on multiple machine learning algorithms and vocal emotional characteristics.
Meimei CHEN ; Yang WANG ; Huangwei LEI ; Fei ZHANG ; Ruina HUANG ; Zhaoyang YANG
Journal of Southern Medical University 2025;45(4):711-717
OBJECTIVES:
To construct vocal recognition classification models using 6 machine learning algorithms and vocal emotional characteristics of individuals with subthreshold depression to facilitate early identification of subthreshold depression.
METHODS:
We collected voice data from both normal individuals and participants with subthreshold depression by asking them to read specifically chosen words and texts. From each voice sample, 384-dimensional vocal emotional feature variables were extracted, including energy feature, Meir frequency cepstrum coefficient, zero cross rate feature, sound probability feature, fundamental frequency feature, difference feature. The Recursive Feature Elimination (RFE) method was employed to select voice feature variables. Classification models were then built using the machine learning algorithms Adaptive Boosting (AdaBoost), Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Lasso Regression (LRLasso), and Support Vector Machine (SVM), and the performance of these models was evaluated. To assess generalization capability of the models, we used real-world speech data to evaluate the best speech recognition classification model.
RESULTS:
The AdaBoost, RF, and LDA models achieved high prediction accuracies of 100%, 100%, and 93.3% on word-reading speech test set, respectively. In the text-reading speech test set, the accuracies of the AdaBoost, RF, and LDA models were 90%, 80%, and 90%, respectively, while the accuracies of the other 3 models were all below 80%. On real-world word-reading and text-reading speech data, the classification models using AdaBoost and Random Forest still achieved high predictive accuracies (91.7% and 80.6% for AdaBoost and 86.1% and 77.8% for Random, respectively).
CONCLUSIONS
Analyzing vocal emotional characteristics allows effective identification of individuals with subthreshold depression. The AdaBoost and RF models show excellent performance for classifying subthreshold depression individuals, and may thus potentially offer valuable assistance in the clinical and research settings.
Humans
;
Machine Learning
;
Emotions
;
Depression/diagnosis*
;
Algorithms
;
Voice
;
Support Vector Machine
;
Male
;
Female

Result Analysis
Print
Save
E-mail