Construction and analysis of a machine learning-based predictive model for early neurological deterioration in patients with acute cerebral infarction

Ben HUANG; Mingxuan ZHENG; Shuxian MIAO; Li WEI; Yan ZHANG

Return

Construction and analysis of a machine learning-based predictive model for early neurological deterioration in patients with acute cerebral infarction

VernacularTitle:基于机器学习算法急性脑梗死患者早期神经功能恶化预测模型的构建与分析
Author: Ben HUANG ¹ ; Mingxuan ZHENG ; Shuxian MIAO ; Li WEI ; Yan ZHANG
Author Information

1. 南京医科大学第一附属医院检验学部，南京　210029
Publication Type:Journal Article
Keywords: Acute cerebral infarction; Early neurological deterioration; Machine learning; Prediction model; SHAP analysis
From: Chinese Journal of Laboratory Medicine 2025;48(12):1535-1545
CountryChina
Language:Chinese
Abstract: Objective:This study aims to develop a laboratory-based predictive model for early neurological deterioration (END) in patients with acute cerebral infarction (ACI) using baseline data collected at hospital admission.Methods:This study was a retrospective cohort study. Clinical and baseline laboratory test data from 502 patients with ACI admitted to the Department of Neurology at our hospital between January 1, 2022 and May 31, 2025. Of these patients, 313 were male and 189 were female, with a median age of 67 years (interquartile range: 58-73). Patients were classified into an END group and a non-END group according to the occurrence of END within 7 days of admission. Subsequently, using the caret package in R (version 4.4.2), the dataset was randomly divided into a training set ( n=351) and a validation set ( n=151) at a 7∶3 ratio, with END status as the stratification variable and a fixed random seed to ensure reproducibility. Following baseline characteristic comparisons between groups, these datasets were used for model development and validation, respectively. The differences in clinical indicators between the two patients groups were assessed using the chi-square test and the Wilcoxon rank sum test. In the training group, Lasso regression was utilized to identify variables significantly associated with END. Seven machine learning algorithms-decision tree (DT), random forest (RF), light gradient boosting machine (LGBM), extreme gradient boosting (XGB), K-nearest neighbors (KNN), support vector machine (SVM), and logistic regression (LR)-were employed to develop predictive models. The optimal hyperparameters were determined via grid search integrated with 5-fold cross-validation. The final algorithm was selected based on comprehensive model performance evaluation. Additionally, clinical data of 79 patients with ACI, collected between June 1 to August 31, 2025, were compiled as an independent test set for external validation. The cohort comprised 49 males and 30 females, with a median age of 68 years (interquartile range: 57-72). The SHapley Additive exPlanations (SHAP) method was employed to access feature importance and model interpretability. SHAP dependence plots and interaction plots were utilized to emplore the nonlinear relationships and interaction effects among the featurevariables. Results:Among the 502 patients, 166 experienced END during 7 days of hospitalization. Lasso regression identified nine significant predictors: history of hyperlipidemia, admission NIHSS score, lymphocyte-to-monocyte ratio (LMR), hemoglobin, D-dimer, albumin, neuron-specific enolase (NSE), homocysteine (HCY), and vitamin B12. The area under the receiver operating characteristic curve (AUC) for the seven machine learning models ranged from 0.709 to 0.946. The XGB model achieved the highest predictive performance, with an AUC of 0.946 (95% CI 0.924-0.960) in the training cohort and 0.867 (95% CI 0.902-0.933) in the validation cohort. SHAP analysis revealed that the top five variables contributing to END prediction were admission NIHSS score, HCY, D-dimer, history of hyperlipidemia, and vitamin B12. Conclusion:This study successfully developed a laboratory-based prediction model for END using the XGB machine learning algorithm, which demonstrated strong predictive performance.