1.A multi-constraint representation learning model for identification of ovarian cancer with missing laboratory indicators.
Zihan LU ; Fangjun HUANG ; Guangyao CAI ; Jihong LIU ; Xin ZHEN
Journal of Southern Medical University 2025;45(1):170-178
OBJECTIVES:
To evaluate the performance of a multi-constraint representation learning classification model for identifying ovarian cancer with missing laboratory indicators.
METHODS:
Tabular data with missing laboratory indicators were collected from 393 patients with ovarian cancer and 1951 control patients. The missing ovarian cancer laboratory indicator features were projected to the latent space to obtain a classification model using the representational learning classification model based on discriminative learning and mutual information coupled with feature projection significance score consistency and missing location estimation. The proposed constraint term was ablated experimentally to assess the feasibility and validity of the constraint term by accuracy, area under the ROC curve (AUC), sensitivity, and specificity. Cross-validation methods and accuracy, AUC, sensitivity and specificity were also used to evaluate the discriminative performance of this classification model in comparison with other interpolation methods for processing of the missing data.
RESULTS:
The results of the ablation experiments showed good compatibility among the constraints, and each constraint had good robustness. The cross-validation experiment showed that for identification of ovarian cancer with missing laboratory indicators, the AUC, accuracy, sensitivity and specificity of the proposed multi-constraints representation-based learning classification model was 0.915, 0.888, 0.774, and 0.910, respectively, and its AUC and sensitivity were superior to those of other interpolation methods.
CONCLUSIONS
The proposed model has excellent discriminatory ability with better performance than other missing data interpolation methods for identification of ovarian cancer with missing laboratory indicators.
Female
;
Humans
;
Ovarian Neoplasms/diagnosis*
;
Machine Learning
;
ROC Curve
2.Construction of recognition models for subthreshold depression based on multiple machine learning algorithms and vocal emotional characteristics.
Meimei CHEN ; Yang WANG ; Huangwei LEI ; Fei ZHANG ; Ruina HUANG ; Zhaoyang YANG
Journal of Southern Medical University 2025;45(4):711-717
OBJECTIVES:
To construct vocal recognition classification models using 6 machine learning algorithms and vocal emotional characteristics of individuals with subthreshold depression to facilitate early identification of subthreshold depression.
METHODS:
We collected voice data from both normal individuals and participants with subthreshold depression by asking them to read specifically chosen words and texts. From each voice sample, 384-dimensional vocal emotional feature variables were extracted, including energy feature, Meir frequency cepstrum coefficient, zero cross rate feature, sound probability feature, fundamental frequency feature, difference feature. The Recursive Feature Elimination (RFE) method was employed to select voice feature variables. Classification models were then built using the machine learning algorithms Adaptive Boosting (AdaBoost), Random Forest (RF), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Lasso Regression (LRLasso), and Support Vector Machine (SVM), and the performance of these models was evaluated. To assess generalization capability of the models, we used real-world speech data to evaluate the best speech recognition classification model.
RESULTS:
The AdaBoost, RF, and LDA models achieved high prediction accuracies of 100%, 100%, and 93.3% on word-reading speech test set, respectively. In the text-reading speech test set, the accuracies of the AdaBoost, RF, and LDA models were 90%, 80%, and 90%, respectively, while the accuracies of the other 3 models were all below 80%. On real-world word-reading and text-reading speech data, the classification models using AdaBoost and Random Forest still achieved high predictive accuracies (91.7% and 80.6% for AdaBoost and 86.1% and 77.8% for Random, respectively).
CONCLUSIONS
Analyzing vocal emotional characteristics allows effective identification of individuals with subthreshold depression. The AdaBoost and RF models show excellent performance for classifying subthreshold depression individuals, and may thus potentially offer valuable assistance in the clinical and research settings.
Humans
;
Machine Learning
;
Emotions
;
Depression/diagnosis*
;
Algorithms
;
Voice
;
Support Vector Machine
;
Male
;
Female
3.Construction of risk prediction models of hypothermia after transurethral holmium laser enucleation of the prostate based on three machine learning algorithms.
Jun JIANG ; Shuo FENG ; Yingui SUN ; Yan AN
Journal of Southern Medical University 2025;45(9):2019-2025
OBJECTIVES:
To develop risk prediction models for postoperative hypothermia after transurethral holmium laser enucleation of the prostate (HoLEP) using machine learning algorithms.
METHODS:
We retrospectively analyzed the clinical data of 403 patients from our center (283 patients in the training set and 120in the internal validation set) and 120 patients from Weifang People's Hospital (as the external validation set). The risk prediction models were built using logistic regression, decision tree and support vector machine (SVM), and model performance was evaluated in terms of accuracy, recall, precision, F1 score and AUC.
RESULTS:
Operation duration, prostate weight, intraoperative irrigation volume, and being underweight were identified as the predictors of postoperative hypothermia following HoLEP. Among the 3 algorithms, SVM showed the best precision rate and accuracy in all the 3 data sets and the best area under the ROC (AUC) in the training set and validation set, followed by logistic regression, which had a similar AUC in the two data sets. SVM outperformed logistic regression and decision tree models in the validation set in precision, accuracy, recall, F1 score, and AUC, and performed well in the external validation set with better precision rate and accuracy than logistic regression and decision tree models but slightly lower recall rate, F1 index, and AUC value than the decision tree model. SVM outperformed logistic regression and decision tree models in precision, accuracy, F1 score, and AUC in the training set, but had slightly lower recall rate than the decision tree.
CONCLUSIONS
Among the 3 models, SVM has the best performance and generalizability for predicting post-HoLEP hypothermia risk to provide support for clinical decisions.
Humans
;
Male
;
Retrospective Studies
;
Machine Learning
;
Transurethral Resection of Prostate/adverse effects*
;
Hypothermia/etiology*
;
Prostatic Hyperplasia/surgery*
;
Algorithms
;
Lasers, Solid-State
;
Risk Assessment
;
Postoperative Complications
;
Decision Trees
;
Logistic Models
;
Aged
;
Middle Aged
;
Support Vector Machine
4.Accurate Machine Learning-based Monitoring of Anesthesia Depth with EEG Recording.
Zhiyi TU ; Yuehan ZHANG ; Xueyang LV ; Yanyan WANG ; Tingting ZHANG ; Juan WANG ; Xinren YU ; Pei CHEN ; Suocheng PANG ; Shengtian LI ; Xiongjie YU ; Xuan ZHAO
Neuroscience Bulletin 2025;41(3):449-460
General anesthesia, pivotal for surgical procedures, requires precise depth monitoring to mitigate risks ranging from intraoperative awareness to postoperative cognitive impairments. Traditional assessment methods, relying on physiological indicators or behavioral responses, fall short of accurately capturing the nuanced states of unconsciousness. This study introduces a machine learning-based approach to decode anesthesia depth, leveraging EEG data across different anesthesia states induced by propofol and esketamine in rats. Our findings demonstrate the model's robust predictive accuracy, underscored by a novel intra-subject dataset partitioning and a 5-fold cross-validation method. The research diverges from conventional monitoring by utilizing anesthetic infusion rates as objective indicators of anesthesia states, highlighting distinct EEG patterns and enhancing prediction accuracy. Moreover, the model's ability to generalize across individuals suggests its potential for broad clinical application, distinguishing between anesthetic agents and their depths. Despite relying on rat EEG data, which poses questions about real-world applicability, our approach marks a significant advance in anesthesia monitoring.
Animals
;
Machine Learning
;
Electroencephalography/methods*
;
Ketamine/administration & dosage*
;
Rats
;
Male
;
Propofol/administration & dosage*
;
Rats, Sprague-Dawley
;
Anesthesia, General/methods*
;
Brain/physiology*
;
Intraoperative Neurophysiological Monitoring/methods*
5.A Novel Real-time Phase Prediction Network in EEG Rhythm.
Hao LIU ; Zihui QI ; Yihang WANG ; Zhengyi YANG ; Lingzhong FAN ; Nianming ZUO ; Tianzi JIANG
Neuroscience Bulletin 2025;41(3):391-405
Closed-loop neuromodulation, especially using the phase of the electroencephalography (EEG) rhythm to assess the real-time brain state and optimize the brain stimulation process, is becoming a hot research topic. Because the EEG signal is non-stationary, the commonly used EEG phase-based prediction methods have large variances, which may reduce the accuracy of the phase prediction. In this study, we proposed a machine learning-based EEG phase prediction network, which we call EEG phase prediction network (EPN), to capture the overall rhythm distribution pattern of subjects and map the instantaneous phase directly from the narrow-band EEG data. We verified the performance of EPN on pre-recorded data, simulated EEG data, and a real-time experiment. Compared with widely used state-of-the-art models (optimized multi-layer filter architecture, auto-regress, and educated temporal prediction), EPN achieved the lowest variance and the greatest accuracy. Thus, the EPN model will provide broader applications for EEG phase-based closed-loop neuromodulation.
Humans
;
Electroencephalography/methods*
;
Brain/physiology*
;
Machine Learning
;
Signal Processing, Computer-Assisted
;
Male
;
Adult
;
Neural Networks, Computer
;
Brain Waves/physiology*
6.Combined Study of Behavior and Spike Discharges Associated with Negative Emotions in Mice.
Jinru XIN ; Xinmiao WANG ; Xuechun MENG ; Ling LIU ; Mingqing LIU ; Huangrui XIONG ; Aiping LIU ; Ji LIU
Neuroscience Bulletin 2025;41(10):1843-1860
In modern society, people are increasingly exposed to chronic stress, leading to various mental disorders. However, the activities of brain regions, especially neural firing patterns related to specific behaviors, remain unclear. In this study, we introduce a novel approach, NeuroSync, which integrates open-field behavioral testing with electrophysiological recordings from emotion-related brain regions, specifically the central amygdala and the paraventricular nucleus of the hypothalamus, to explore the mechanisms of negative emotions induced by chronic stress in mice. By applying machine vision techniques, we quantified behaviors in the open field, and signal processing algorithms elucidated the neural underpinnings of the observed behaviors. Synchronizing behavioral and electrophysiological data revealed significant correlations between neural firing patterns and stress-related behaviors, providing insights into real-time brain activity underlying behavioral responses. This research combines deep learning and machine learning to synchronize high-resolution video and electrophysiological data, offering new insights into neural-behavioral dynamics under chronic stress conditions.
Animals
;
Mice
;
Male
;
Emotions/physiology*
;
Stress, Psychological/physiopathology*
;
Action Potentials/physiology*
;
Mice, Inbred C57BL
;
Behavior, Animal/physiology*
;
Machine Learning
;
Amygdala/physiopathology*
;
Neurons/physiology*
;
Paraventricular Hypothalamic Nucleus/physiopathology*
;
Brain/physiology*
7.Determining the biomarkers and pathogenesis of myocardial infarction combined with ankylosing spondylitis via a systems biology approach.
Chunying LIU ; Chengfei PENG ; Xiaodong JIA ; Chenghui YAN ; Dan LIU ; Xiaolin ZHANG ; Haixu SONG ; Yaling HAN
Frontiers of Medicine 2025;19(3):507-522
Ankylosing spondylitis (AS) is linked to an increased prevalence of myocardial infarction (MI). However, research dedicated to elucidating the pathogenesis of AS-MI is lacking. In this study, we explored the biomarkers for enhancing the diagnostic and therapeutic efficiency of AS-MI. Datasets were obtained from the Gene Expression Omnibus database. We employed weighted gene co-expression network analysis and machine learning models to screen hub genes. A receiver operating characteristic curve and a nomogram were designed to assess diagnostic accuracy. Gene set enrichment analysis was conducted to reveal the potential function of hub genes. Immune infiltration analysis indicated the correlation between hub genes and the immune landscape. Subsequently, we performed single-cell analysis to identify the expression and subcellular localization of hub genes. We further constructed a transcription factor (TF)-microRNA (miRNA) regulatory network. Finally, drug prediction and molecular docking were performed. S100A12 and MCEMP1 were identified as hub genes, which were correlated with immune-related biological processes. They exhibited high diagnostic value and were predominantly expressed in myeloid cells. Furthermore, 24 TFs and 9 miRNA were associated with these hub genes. Enzastaurin, meglitinide, and nifedipine were predicted as potential therapeutic agents. Our study indicates that S100A12 and MCEMP1 exhibit significant potential as biomarkers and therapeutic targets for AS-MI, offering novel insights into the underlying etiology of this condition.
Humans
;
Spondylitis, Ankylosing/complications*
;
Systems Biology/methods*
;
Myocardial Infarction/diagnosis*
;
Biomarkers/metabolism*
;
MicroRNAs/genetics*
;
Gene Regulatory Networks
;
Gene Expression Profiling
;
Machine Learning
8.Establishing of mortality predictive model for elderly critically ill patients using simple bedside indicators and interpretable machine learning algorithms.
Yulan MENG ; Jiaxin LI ; Xinqiang SHAN ; Pengyu LU ; Wei HUANG
Chinese Critical Care Medicine 2025;37(2):170-176
OBJECTIVE:
To explore the feasibility of incorporating simple bedside indicators into death predictive model for elderly critically ill patients based on interpretability machine learning algorithms, providing a new scheme for clinical disease assessment.
METHODS:
Elderly critically ill patients aged ≥ 65 years who were hospitalized in the intensive care unit (ICU) of Tacheng People's Hospital of Ili Kazak Autonomous Prefecture from June 2017 to May 2020 were retrospectively selected. Basic parameters including demographic characteristics, basic vital signs and fluid intake and output within 24 hours after admission, as well acute physiology and chronic health evaluation II (APACHE II), Glasgow coma score (GCS) and sequential organ failure assessment (SOFA) were also collected. According to outcomes in hospital, patients were divided into survival group and death group. Four datasets were constructed respectively, namely baseline dataset (B), including age, body temperature, heart rate, pulse oxygen saturation, respiratory rate, mean arterial pressure, urine output volume, infusion volume, and crystal solution volume; B+APACHE II dataset (BA), B+GCS dataset (BG), and B+SOFA dataset (BS). Then three machine learning algorithms, Logistic regression (LR), extreme gradient boosting (XGboost) and gradient boosting decision tree (GBDT) were used to develop the corresponding mortality predictive models within four datasets. The feature importance histogram of each prediction model was drawn by SHapley additive explanation (SHAP) method. The area under curve (AUC), accuracy and F1 score of each model were compared to determine the optimal prediction model and then illuminate the nomogram.
RESULTS:
A total of 392 patients were collected, including 341 in the survival group and 51 in the death group. There were statistically significant differences in heart rate, pulse oxygen saturation, mean arterial pressure, infusion volume, crystal solution volume, and etiological distribution between the two groups. The top three causes of death were shock, cerebral hemorrhage, and chronic obstructive pulmonary disease. Among the 12 prognostic models trained by three machine learning algorithms, overall performance of prognostic models based on B dataset was behind, whereas the LR model trained by BA dataset achieved the best performance than others with AUC of 0.767 [95% confidence interval (95%CI) was 0.692-0.836], accuracy of 0.875 (95%CI was 0.837-0.903) and F1 score of 0.190. The top 3 variables in this model were crystal solution volume with first 24 hours, heart rate and mean arterial pressure. The nomogram of the model showed that the total score between 150 and 230 were advisable.
CONCLUSION
The interpretable machine learning model including simple bedside parameters combined with APACHE II score could effectively identify the risk of death in elderly patients with critically illness.
Humans
;
Critical Illness
;
Machine Learning
;
Aged
;
Algorithms
;
Intensive Care Units
;
Retrospective Studies
;
APACHE
;
Prognosis
;
Organ Dysfunction Scores
;
Hospital Mortality
;
Male
;
Female
9.Construction and external validation of a machine learning-based prediction model for epilepsy one year after acute stroke.
Wenkao ZHOU ; Fangli ZHAO ; Xingqiang QIU ; Yujuan YANG ; Tingting WANG ; Lingyan HUANG
Chinese Critical Care Medicine 2025;37(5):445-451
OBJECTIVE:
To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.
METHODS:
A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.
RESULTS:
Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.
CONCLUSIONS
The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.
Humans
;
Machine Learning
;
Stroke/complications*
;
Nomograms
;
Epilepsy/etiology*
;
Algorithms
;
Male
;
Female
;
Logistic Models
;
Middle Aged
;
Aged
;
Risk Factors
;
Bayes Theorem
10.Early warning method for invasive mechanical ventilation in septic patients based on machine learning model.
Wanjun LIU ; Wenyan XIAO ; Jin ZHANG ; Juanjuan HU ; Shanshan HUANG ; Yu LIU ; Tianfeng HUA ; Min YANG
Chinese Critical Care Medicine 2025;37(7):644-650
OBJECTIVE:
To develop a method for identifying high-risk patients among septic populations requiring mechanical ventilation, and to conduct phenotypic analysis based on this method.
METHODS:
Data from four sources were utilized: the Medical Information Mart for Intensive Care (MIMIC-IV 2.0, MIMIC-III 1.4), the Philips eICU-Collaborative Research Database 2.0 (eICU-CRD 2.0), and the Anhui Medical University Second Affiliated Hospital dataset. The adult patients in intensive care unit (ICU) who met Sepsis-3 and received invasive mechanical ventilation (IMV) on the first day of first admission were enrolled. The MIMIC-IV dataset with the highest data integrity was divided into a training set and a test set at a 6:1 ratio, while the remaining datasets were served as validation sets. The demographic information, comorbidities, laboratory indicators, commonly used ICU scores, and treatment measures of patients were extracted. Clinical data collected within first day of ICU admission were used to calculate the sequential organ failure assessment (SOFA) score. K-means clustering was applied to cluster SOFA score components, and the sum of squared errors (SSE) and Davies-Bouldin index (DBI) were used to determine the optimal number of disease subtypes. For clustering results, normalized methods were employed to compare baseline characteristics by visualization, and Kaplan-Meier curves were used to analyze clinical outcomes across phenotypes.
RESULTS:
This study enrolled patients from MIMIC-IV dataset (n = 11 166), MIMIC-III dataset (n = 4 821), eICU-CRD dataset (n = 6 624), and a local dataset (n = 110), with the four datasets showing similar median ages and male proportions exceeding 50%; using 85% of the MIMIC-IV dataset as the training set, 15% as the test set, and the rest dataset as the validation set. K-means clustering based on the six-item SOFA score was performed to determine the optimal number of clusters as 3, and patients were finally classified into three phenotypes. In the training set, compared with the patients with phenotype II and phenotype III, those with phenotype I had the more severe circulatory and respiratory dysfunction, a higher proportion of vasoactive drug usage, more obvious metabolic acidosis and hypoxia, and a higher incidence of congestive heart failure. The patients with phenotype II was dominated by respiratory dysfunction with higher visceral injury. The patients with phenotype III had relatively stable organ function. The above characteristics were consistent in both the test and validation sets. Analysis of infection-related indicators showed that the patients with phenotype I had the highest SOFA score within 7 days after ICU admission, initial decreases and later increases in platelet count (PLT), and higher counts of neutrophils, lymphocytes, and monocytes as compared with those with phenotype II and phenotype III, their blood cultures had a higher positivity rates for Gram-positive bacteria, Gram-negative bacteria and fungi as compared with those with phenotype II and phenotype III. The Kaplan-Meier curve indicated that in the training, test, and validation sets, the 28-day cumulative mortality of patients with phenotype I was significantly higher than that of patients with phenotypes II and phenotype III.
CONCLUSIONS
Three distinct phenotypes in septic patients receiving IMV based on unsupervised machine learning is derived, among which phenotype I, characterized by cardiorespiratory failure, can be used for the early identification of high-risk patients in this population. Moreover, this population is more prone to bloodstream infections, posing a high risk and having a poor prognosis.
Humans
;
Machine Learning
;
Sepsis/therapy*
;
Respiration, Artificial
;
Intensive Care Units
;
Organ Dysfunction Scores
;
Male
;
Female
;
Middle Aged
;
Adult

Result Analysis
Print
Save
E-mail