1.Comparison of Logistic Regression and Machine Learning Approaches in Predicting Depressive Symptoms: A National-Based Study
Xing-Xuan DONG ; Jian-Hua LIU ; Tian-Yang ZHANG ; Chen-Wei PAN ; Chun-Hua ZHAO ; Yi-Bo WU ; Dan-Dan CHEN
Psychiatry Investigation 2025;22(3):267-278
Objective:
Machine learning (ML) has been reported to have better predictive capability than traditional statistical techniques. The aim of this study was to assess the efficacy of ML algorithms and logistic regression (LR) for predicting depressive symptoms during the COVID-19 pandemic.
Methods:
Analyses were carried out in a national cross-sectional study involving 21,916 participants. The ML algorithms in this study included random forest (RF), support vector machine (SVM), neural network (NN), and gradient boosting machine (GBM) methods. The performance indices were sensitivity, specificity, accuracy, precision, F1-score, and area under the receiver operating characteristic curve (AUC).
Results:
LR and NN had the best performance in terms of AUCs. The risk of overfitting was found to be negligible for most ML models except for RF, and GBM obtained the highest sensitivity, specificity, accuracy, precision, and F1-score. Therefore, LR, NN, and GBM models ranked among the best models.
Conclusion
Compared with ML models, LR model performed comparably to ML models in predicting depressive symptoms and identifying potential risk factors while also exhibiting a lower risk of overfitting.
2.Comparison of Logistic Regression and Machine Learning Approaches in Predicting Depressive Symptoms: A National-Based Study
Xing-Xuan DONG ; Jian-Hua LIU ; Tian-Yang ZHANG ; Chen-Wei PAN ; Chun-Hua ZHAO ; Yi-Bo WU ; Dan-Dan CHEN
Psychiatry Investigation 2025;22(3):267-278
Objective:
Machine learning (ML) has been reported to have better predictive capability than traditional statistical techniques. The aim of this study was to assess the efficacy of ML algorithms and logistic regression (LR) for predicting depressive symptoms during the COVID-19 pandemic.
Methods:
Analyses were carried out in a national cross-sectional study involving 21,916 participants. The ML algorithms in this study included random forest (RF), support vector machine (SVM), neural network (NN), and gradient boosting machine (GBM) methods. The performance indices were sensitivity, specificity, accuracy, precision, F1-score, and area under the receiver operating characteristic curve (AUC).
Results:
LR and NN had the best performance in terms of AUCs. The risk of overfitting was found to be negligible for most ML models except for RF, and GBM obtained the highest sensitivity, specificity, accuracy, precision, and F1-score. Therefore, LR, NN, and GBM models ranked among the best models.
Conclusion
Compared with ML models, LR model performed comparably to ML models in predicting depressive symptoms and identifying potential risk factors while also exhibiting a lower risk of overfitting.
3.Comparison of Logistic Regression and Machine Learning Approaches in Predicting Depressive Symptoms: A National-Based Study
Xing-Xuan DONG ; Jian-Hua LIU ; Tian-Yang ZHANG ; Chen-Wei PAN ; Chun-Hua ZHAO ; Yi-Bo WU ; Dan-Dan CHEN
Psychiatry Investigation 2025;22(3):267-278
Objective:
Machine learning (ML) has been reported to have better predictive capability than traditional statistical techniques. The aim of this study was to assess the efficacy of ML algorithms and logistic regression (LR) for predicting depressive symptoms during the COVID-19 pandemic.
Methods:
Analyses were carried out in a national cross-sectional study involving 21,916 participants. The ML algorithms in this study included random forest (RF), support vector machine (SVM), neural network (NN), and gradient boosting machine (GBM) methods. The performance indices were sensitivity, specificity, accuracy, precision, F1-score, and area under the receiver operating characteristic curve (AUC).
Results:
LR and NN had the best performance in terms of AUCs. The risk of overfitting was found to be negligible for most ML models except for RF, and GBM obtained the highest sensitivity, specificity, accuracy, precision, and F1-score. Therefore, LR, NN, and GBM models ranked among the best models.
Conclusion
Compared with ML models, LR model performed comparably to ML models in predicting depressive symptoms and identifying potential risk factors while also exhibiting a lower risk of overfitting.
4.Comparison of Logistic Regression and Machine Learning Approaches in Predicting Depressive Symptoms: A National-Based Study
Xing-Xuan DONG ; Jian-Hua LIU ; Tian-Yang ZHANG ; Chen-Wei PAN ; Chun-Hua ZHAO ; Yi-Bo WU ; Dan-Dan CHEN
Psychiatry Investigation 2025;22(3):267-278
Objective:
Machine learning (ML) has been reported to have better predictive capability than traditional statistical techniques. The aim of this study was to assess the efficacy of ML algorithms and logistic regression (LR) for predicting depressive symptoms during the COVID-19 pandemic.
Methods:
Analyses were carried out in a national cross-sectional study involving 21,916 participants. The ML algorithms in this study included random forest (RF), support vector machine (SVM), neural network (NN), and gradient boosting machine (GBM) methods. The performance indices were sensitivity, specificity, accuracy, precision, F1-score, and area under the receiver operating characteristic curve (AUC).
Results:
LR and NN had the best performance in terms of AUCs. The risk of overfitting was found to be negligible for most ML models except for RF, and GBM obtained the highest sensitivity, specificity, accuracy, precision, and F1-score. Therefore, LR, NN, and GBM models ranked among the best models.
Conclusion
Compared with ML models, LR model performed comparably to ML models in predicting depressive symptoms and identifying potential risk factors while also exhibiting a lower risk of overfitting.
5.Comparison of Logistic Regression and Machine Learning Approaches in Predicting Depressive Symptoms: A National-Based Study
Xing-Xuan DONG ; Jian-Hua LIU ; Tian-Yang ZHANG ; Chen-Wei PAN ; Chun-Hua ZHAO ; Yi-Bo WU ; Dan-Dan CHEN
Psychiatry Investigation 2025;22(3):267-278
Objective:
Machine learning (ML) has been reported to have better predictive capability than traditional statistical techniques. The aim of this study was to assess the efficacy of ML algorithms and logistic regression (LR) for predicting depressive symptoms during the COVID-19 pandemic.
Methods:
Analyses were carried out in a national cross-sectional study involving 21,916 participants. The ML algorithms in this study included random forest (RF), support vector machine (SVM), neural network (NN), and gradient boosting machine (GBM) methods. The performance indices were sensitivity, specificity, accuracy, precision, F1-score, and area under the receiver operating characteristic curve (AUC).
Results:
LR and NN had the best performance in terms of AUCs. The risk of overfitting was found to be negligible for most ML models except for RF, and GBM obtained the highest sensitivity, specificity, accuracy, precision, and F1-score. Therefore, LR, NN, and GBM models ranked among the best models.
Conclusion
Compared with ML models, LR model performed comparably to ML models in predicting depressive symptoms and identifying potential risk factors while also exhibiting a lower risk of overfitting.
6.Explainable machine learning model for predicting septic shock in critically sepsis patients based on coagulation indexes: A multicenter cohort study.
Qing-Bo ZENG ; En-Lan PENG ; Ye ZHOU ; Qing-Wei LIN ; Lin-Cui ZHONG ; Long-Ping HE ; Nian-Qing ZHANG ; Jing-Chun SONG
Chinese Journal of Traumatology 2025;28(6):404-411
PURPOSE:
Septic shock is associated with high mortality and poor outcomes among sepsis patients with coagulopathy. Although traditional statistical methods or machine learning (ML) algorithms have been proposed to predict septic shock, these potential approaches have never been systematically compared. The present work aimed to develop and compare models to predict septic shock among patients with sepsis.
METHODS:
It is a retrospective cohort study based on 484 patients with sepsis who were admitted to our intensive care units between May 2018 and November 2022. Patients from the 908th Hospital of Chinese PLA Logistical Support Force and Nanchang Hongdu Hospital of Traditional Chinese Medicine were respectively allocated to training (n=311) and validation (n=173) sets. All clinical and laboratory data of sepsis patients characterized by comprehensive coagulation indexes were collected. We developed 5 models based on ML algorithms and 1 model based on a traditional statistical method to predict septic shock in the training cohort. The performance of all models was assessed using the area under the receiver operating characteristic curve and calibration plots. Decision curve analysis was used to evaluate the net benefit of the models. The validation set was applied to verify the predictive accuracy of the models. This study also used Shapley additive explanations method to assess variable importance and explain the prediction made by a ML algorithm.
RESULTS:
Among all patients, 37.2% experienced septic shock. The characteristic curves of the 6 models ranged from 0.833 to 0.962 and 0.630 to 0.744 in the training and validation sets, respectively. The model with the best prediction performance was based on the support vector machine (SVM) algorithm, which was constructed by age, tissue plasminogen activator-inhibitor complex, prothrombin time, international normalized ratio, white blood cells, and platelet counts. The SVM model showed good calibration and discrimination and a greater net benefit in decision curve analysis.
CONCLUSION
The SVM algorithm may be superior to other ML and traditional statistical algorithms for predicting septic shock. Physicians can better understand the reliability of the predictive model by Shapley additive explanations value analysis.
Humans
;
Shock, Septic/blood*
;
Machine Learning
;
Male
;
Female
;
Retrospective Studies
;
Middle Aged
;
Aged
;
Sepsis/complications*
;
ROC Curve
;
Cohort Studies
;
Adult
;
Intensive Care Units
;
Algorithms
;
Blood Coagulation
;
Critical Illness
8.Association of Body Mass Index with All-Cause Mortality and Cause-Specific Mortality in Rural China: 10-Year Follow-up of a Population-Based Multicenter Prospective Study.
Juan Juan HUANG ; Yuan Zhi DI ; Ling Yu SHEN ; Jian Guo LIANG ; Jiang DU ; Xue Fang CAO ; Wei Tao DUAN ; Ai Wei HE ; Jun LIANG ; Li Mei ZHU ; Zi Sen LIU ; Fang LIU ; Shu Min YANG ; Zu Hui XU ; Cheng CHEN ; Bin ZHANG ; Jiao Xia YAN ; Yan Chun LIANG ; Rong LIU ; Tao ZHU ; Hong Zhi LI ; Fei SHEN ; Bo Xuan FENG ; Yi Jun HE ; Zi Han LI ; Ya Qi ZHAO ; Tong Lei GUO ; Li Qiong BAI ; Wei LU ; Qi JIN ; Lei GAO ; He Nan XIN
Biomedical and Environmental Sciences 2025;38(10):1179-1193
OBJECTIVE:
This study aimed to explore the association between body mass index (BMI) and mortality based on the 10-year population-based multicenter prospective study.
METHODS:
A general population-based multicenter prospective study was conducted at four sites in rural China between 2013 and 2023. Multivariate Cox proportional hazards models and restricted cubic spline analyses were used to assess the association between BMI and mortality. Stratified analyses were performed based on the individual characteristics of the participants.
RESULTS:
Overall, 19,107 participants with a sum of 163,095 person-years were included and 1,910 participants died. The underweight (< 18.5 kg/m 2) presented an increase in all-cause mortality (adjusted hazards ratio [ aHR] = 2.00, 95% confidence interval [ CI]: 1.66-2.41), while overweight (≥ 24.0 to < 28.0 kg/m 2) and obesity (≥ 28.0 kg/m 2) presented a decrease with an aHR of 0.61 (95% CI: 0.52-0.73) and 0.51 (95% CI: 0.37-0.70), respectively. Overweight ( aHR = 0.76, 95% CI: 0.67-0.86) and mild obesity ( aHR = 0.72, 95% CI: 0.59-0.87) had a positive impact on mortality in people older than 60 years. All-cause mortality decreased rapidly until reaching a BMI of 25.7 kg/m 2 ( aHR = 0.95, 95% CI: 0.92-0.98) and increased slightly above that value, indicating a U-shaped association. The beneficial impact of being overweight on mortality was robust in most subgroups and sensitivity analyses.
CONCLUSION
This study provides additional evidence that overweight and mild obesity may be inversely related to the risk of death in individuals older than 60 years. Therefore, it is essential to consider age differences when formulating health and weight management strategies.
Humans
;
Body Mass Index
;
China/epidemiology*
;
Male
;
Female
;
Middle Aged
;
Prospective Studies
;
Rural Population/statistics & numerical data*
;
Aged
;
Follow-Up Studies
;
Adult
;
Mortality
;
Cause of Death
;
Obesity/mortality*
;
Overweight/mortality*
9.A newly proposed heatstroke-induced coagulopathy score in patients with heat illness: A multicenter retrospective study in China
Qing-Wei LIN ; Lin-Cui ZHONG ; Long-Ping HE ; Qing-Bo ZENG ; Wei ZHANG ; Qing SONG ; Jing-Chun SONG
Chinese Journal of Traumatology 2024;27(2):83-90
Purpose::In patients with heatstroke, disseminated intravascular coagulation (DIC) is associated with greater risk of in-hospital mortality. However, time-consuming assays or a complex diagnostic system may delay immediate treatment. Therefore, the present study proposes a new heatstroke-induced coagulopathy (HIC) score in patients with heat illness as an early warning indicator for DIC.Methods::This retrospective study enrolled patients with heat illness in 24 Chinese hospitals from March 2021 to May 2022. Patients under 18 years old, with a congenital clotting disorder or liver disease, or using anticoagulants were excluded. Data were collected on demographic characteristics, routine blood tests, conventional coagulation assays and biochemical indexes. The risk factors related to coagulation function in heatstroke were identified by regression analysis, and used to construct a scoring system for HIC. The data of patients who met the diagnostic criteria for HIC and International Society on Thrombosis and Haemostasis defined-DIC were analyzed. All statistical analyses were performed using SPSS 26.0.Results::The final analysis included 302 patients with heat illness, of whom 131 (43.4%) suffered from heatstroke, including 7 death (5.3%). Core temperature ( OR = 1.681, 95% CI 1.291 - 2.189, p < 0.001), prothrombin time ( OR = 1.427, 95% CI 1.175 - 1.733, p < 0.001) and D-dimer ( OR = 1.242, 95% CI 1.049 - 1.471, p = 0.012) were independent risk factors for heatstroke, and therefore used to construct an HIC scoring system because of their close relation with abnormal coagulation. A total score ≥ 3 indicated HIC, and HIC scores correlated with the score for International Society of Thrombosis and Hemostasis-DIC ( r = 0.8848, p < 0.001). The incidence of HIC (27.5%) was higher than that of DIC (11.2%) in all of 131 heatstroke patients. Meanwhile, the mortality rate of HIC (19.4%) was lower than that of DIC (46.7%). When HIC developed into DIC, parameters of coagulation dysfunction changed significantly: platelet count decreased, D-dimer level rose, and prothrombin time and activated partial thromboplastin time prolonged ( p < 0.05). Conclusions::The newly proposed HIC score may provide a valuable tool for early detection of HIC and prompt initiation of treatment.
10.Migraineur patent foramen ovale risk prediction model for female migraine patient streaming and clinical decision-making
Xiao-Chun ZHANG ; Jia-Ning FAN ; Li ZHU ; Feng ZHANG ; Da-Wei LIN ; Wan-Ling WANG ; Wen-Zhi PAN ; Da-Xin ZHOU ; Jun-Bo GE
Fudan University Journal of Medical Sciences 2024;51(4):505-514
Objective To investigate the clinical characteristics of female migraine patients with patent foramen ovale(PFO)and design a risk prediction model for PFO in female migraine patients(migraineur patients PFO risk prediction model,MPRPM).Methods Female migraine patients who visited Zhongshan Hospital,Fudan University from Jun 1,2019 to Dec 31,2022 were included.Preoperative information and follow-up results after discontinuation of medication were collected.Patients were divided into PFO-positive and PFO-negative groups based on transesophageal echocardiography results.A multivariate Logistic regression model and a random forest model were constructed,and the random forest model was validated multidimensionally.Key features were selected based on the mean decrease accuracy(MDA)to construct MPRPM.Results A total of 305 female patients were included in the study,with 204 patients in the PFO-positive group and 101 patients in the PFO-negative group.Multivariate Logistic regression analysis showed that age at migraine onset,attack frequency,severe impact on life during attacks,exercise-related headaches,menstruation-induced headaches,aura migraines,and a history of cryptogenic stroke were predictive factors for PFO positivity.The random forest model effectively predicted the incidence of PFO in female migraine patients,with an AUC of 0.895(95%CI:0.847-0.943).MPRPM demonstrated a sensitivity of 71.6%and specificity of 91.1%(AUC:0.862,95%CI:0.818-0.906,P<0.001).The optimal cut-off value was 2.5 points.Patients correctly classified by the model showed a higher rate of symptom improvement compared to incorrectly classified patients(94.3%vs.82.0%,P=0.023).Conclusion We identified predictive factors for PFO in migraine patients.MPRPM can provide guidance in the diagnostic process and therapeutic decision-making for female migraine patients,assist in patient triage,and reduce the healthcare burden.

Result Analysis
Print
Save
E-mail