1.Comparison of machine learning and Logistic regression model in predicting acute kidney injury after cardiac surgery: data analysis based on MIMIC-Ⅲ database
Wei XIONG ; Lifan ZHANG ; Kai SHE ; Guo XU ; Shanglin BAI ; Xuan LIU
Chinese Critical Care Medicine 2022;34(11):1188-1193
Objective:To establish an acute kidney injury (AKI) prediction model in patients after cardiac surgery by extreme gradient boosting (XGBoost) machine learning model, and to explore the risk and protective factors for AKI in patients after cardiac surgery.Methods:All patients who underwent cardiac surgery in Medical Information Mart for Intensive Care-Ⅲ (MIMIC-Ⅲ) database were enrolled, and they were divided into AKI group and non-AKI group according to whether AKI developed within 14 days after cardiac surgery. Their clinical characteristics were compared. Based on five-fold cross-validation, XGBoost and Logistic regression were used to establish the prediction model of AKI after cardiac surgery. And the area under the receiver operator characteristic curve (AUC) of the models was compared. The output model of XGBoost was interpreted by Shapley additive explanations (SHAP).Results:A total of 6 912 patients were included, of which 5 681 (82.2%) developed AKI within 14 days after the operation, and 1 231 (17.8%) did not. Compared with the non-AKI group, the main characteristics of AKI group included older age [years: 68.0 (59.0, 76.0) vs. 62.0 (52.0, 71.0)], higher incidence of emergency admission and complicated with obesity and diabetes (52.4% vs. 47.8%, 9.0% vs. 4.0%, 32.0% vs. 22.2%), lower respiratory rate [RR; bpm: times/min: 17.0 (14.0, 20.0) vs. 19.0 (15.0, 22.0)], lower heart rate [HR; bpm: 80.0 (67.0, 89.0) vs. 82.0 (71.5, 93.0)], higher blood pressure [mmHg (1 mmHg ≈ 0.133 kPa): 80.0 (70.7, 90.0) vs. 78.0 (70.0, 88.0)], higher hemoglobin (Hb), blood glucose, blood K + level and serum creatinine [SCr; Hb (g/L): 122.0 (109.0, 136.0) vs. 120.0 (106.0, 135.0), blood glucose (mmol/L): 7.3 (6.1, 8.9) vs. 6.8 (5.7, 8.5), blood K + level (mmol/L): 4.2 (3.9, 4.7) vs. 4.2 (3.8, 4.6), SCr (μmol/L): 88.4 (70.7, 106.1) vs. 79.6 (70.7, 97.2)], lower albumin (ALB) and triacylglycerol [TG; ALB (g/L): 38.0 (35.0, 41.0) vs. 39.0 (37.0, 42.0), TG (mmol/L): 1.4 (1.0, 2.0) vs. 1.5 (1.0, 2.2)] as well as higher incidence of multiple organ dysfunction syndrome (MODS) and sepsis (30.6% vs. 16.2%, 3.3% vs. 1.9%), with significant differences (all P < 0.05). In the output model of Logistic regression, important predictors were lactic acid [Lac; odds ratio ( OR) = 1.062, 95% confidence interval (95% CI) was 1.030-1.100, P = 0.005], obesity ( OR = 2.234, 95% CI was 1.900-2.640, P < 0.001), male ( OR = 0.858, 95% CI was 0.794-0.928, P = 0.049), diabetes ( OR = 1.820, 95% CI was 1.680-1.980, P < 0.001) and emergency admission ( OR = 1.278, 95% CI was 1.190-1.380, P < 0.001). Receiver operator characteristic curve (ROC curve) analysis showed that the AUC of the Logistic regression model for predicting AKI after cardiac surgery was 0.62 (95% CI was 0.61-0.67). After optimizing the XGBoost model parameters by grid search combined with five-fold cross-validation, the model was trained well with no overfitting or overfitting. ROC analysis showed that the AUC of XGBoost model for predicting AKI after cardiac surgery was 0.77 (95% CI was 0.75-0.80), which was significantly higher than that of Logistic regression model ( P < 0.01). After SHAP treatment, in the output model of XGBoost, age and ALB were the most important predictors of the final outcome, where age was the risk factor (average |SHAP value| was 0.434), and ALB was the protective factor (average |SHAP value| was 0.221). Conclusions:Age is an important risk factor for AKI after cardiac surgery, and ALB is a protective factor. The performance of machine learning in predicting cardiac and vascular surgery-associated AKI is better than the traditional Logistic regression. XGBoost can analyze the more complex relationship between variables and outcomes, and can predict the risk of postoperative AKI more accurately and individually.