Construction of an in-hospital mortality prediction model for emergency multiple trauma patients based on supervised machine learning algorithms
10.3760/cma.j.cn115396-20241229-00412
- VernacularTitle:基于监督机器学习算法构建急诊多发伤患者院内死亡的预测模型
- Author:
Dongming HUANG
1
;
Weiliang WANG
Author Information
1. 首都医科大学大兴教学医院急诊科,北京 102600
- Keywords:
Multiple trauma;
Emergency treatment;
Hospital mortality;
Supervised machine learning;
Predictive model
- From:
International Journal of Surgery
2025;52(11):753-760
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To construct the optimal prediction model for in-hospital mortality risk in emergency multiple trauma patients based on different supervised machine learning algorithms.Methods:A retrospective analysis was conducted on the clinical data of 817 patients with emergency multiple trauma who were admitted to the Daxing Teaching Hospital, Capital Medical University from January 2019 to December 2023. Among them, 602 were males and 215 were females, the age ranged from 18 to 89 years, with an average of (54.82±17.25) years. The general information and laboratory test indicators of patients were collected as relevant predictor variables, with in-hospital mortality defined as the study endpoint. The patients were simply and randomly divided into the training set ( n=571) and the testing set ( n=246) in a 7∶3 ratio. Univariate analysis was performed on the training set to compare the relevant variables between the survival and death groups. Variables with statistical significance were then subjected to LASSO regression analysis to identify predictors with non-zero coefficients, which were selected as final features. Three supervised machine learning models, namely Logistic regression (LR), random forest (RF), and support vector machine (SVM) were selected to construct the model. The predictive performance of each model in testing set was evaluated, and the predictive efficacy of the models was verified using receiver operating characteristic (ROC) curve. The measurement data of normal distribution were expressed as mean±standard deviation ( ± s), and comparisons between groups were conducted using the t-test. The measurement data with non-normal distribution were expressed as median and interquartile range [ M( Q1, Q3)], and comparisons between groups were conducted using rank-sum tests. The count data were expressed as the number of cases and percentages, and comparisons between groups were conducted using the Chi-test or Fisher exact probability method. Results:A total of 817 patients were included, with 65 deaths, resulting in a mortality of 8.0%. Univariate analysis was conducted based on the training set data, and then LASSO regression analysis was performed on the variables with statistically significant differences. The results showed 17 variables were risk factors for in-hospital mortality in patients with emergency multiple trauma, including age, albumin, red blood cell (RBC), creatine kinase (CK), glucose (GLU), brain natriuretic peptide (BNP), C-reactive protein (CRP), lactic acid, PCO 2, low-density lipoprotein cholesterol (LDL-C), prothrombin time (PT), fibrinogen (FIB), fibrin degradation products (FDP), troponin I (TNI), procalcitonin (PCT), injury severity score (ISS), and Glasgow coma scale (GCS). Based on the above 17 variables, three supervised machine learning models were established. Among the LR model, the top 5 in terms of importance were PCO 2, PCT, FDP, PT, and RBC. Among the RF model, the top 5 in terms of importance were PCO 2, ISS, GLU, ALB, and GCS. Among the SVM model, the top 5 in terms of importance were PCT, FDP, PCO 2, PT, and GLU. Model performance evaluation in the testing set showed that the area under the curve (AUC) of the LR model was 0.952, the specificity was 0.996, the accuracy was 0.963, and both the sensitivity and recall rate were 0.600. The AUC of the RF model was 0.970, better than the LR and SVM models, the specificity was 0.987, the accuracy was 0.959, and both the sensitivity and recall rate were 0.650. The AUC of the SVM model was 0.944, the specificity was 0.996, the accuracy was 0.947, and both the sensitivity and recall rate were 0.400. Each model had its strengths, but the RF model demonstrated the best overall performance. Conclusion:The RF model constructed using 17 optimal variables such as PCO 2, ISS, GLU, ALB, and GCS shows strong predictive capability for in-hospital mortality in emergency multiple trauma patients and warrants further clinical investigation.