Prediction of neurological function rehabilitation outcome for stroke patients using interpretable machine learning models
10.3969/j.issn.1006-9771.2026.04.010
- VernacularTitle:可解释的机器学习模型对脑卒中患者神经功能康复结局的预测效能
- Author:
Shun GUI
1
;
Jianfei ZHANG
1
;
Huizhi HUANG
1
Author Information
1. Fuzhou First People's Hospital, Fuzhou, Jiangxi 344100, China
- Publication Type:Journal Article
- Keywords:
stroke;
neurological function;
rehabilitation;
outcome;
machine learning;
predictive model
- From:
Chinese Journal of Rehabilitation Theory and Practice
2026;32(4):463-472
- CountryChina
- Language:Chinese
-
Abstract:
ObjectiveTo develop a machine learning (ML)-based prediction model for neurological rehabilitation outcomes of stroke patients. MethodsA total of 420 stroke patients admitted to the Fuzhou First People's Hospital from October, 2022 to October, 2024 were enrolled as the training set. According to the modified Rankin Scale (mRS) scores three months after discharge, the patients were divided into prognosis group (n = 289) and poor prognosis group (n = 131). An additional 180 stroke patients hospitalized in the same hospital from November, 2024 to April, 2025 were selected as the validation set. Univariate analysis, least absolute shrinkage and selection operator regression, and multivariate logistic regression were performed to identify independent influencing factors for the prognosis of neurological function recovery. Using the screened independent influencing factors as feature variables, six ML models were established, including logistic regression, linear discriminant analysis, naive Bayes, support vector machine, random forest and extreme gradient boosting (XGBoost). The area under the receiver operating characteristic curve (AUC), confusion matrix indicators (accuracy, precision, recall and F1-score), calibration curve and decision curve analysis were adopted to evaluate the predictive efficacy, calibration degree and clinical net benefit of each model, with external validation conducted in the validation set. The SHapley Additive exPlanations framework was used to interpret the optimal model, and bar charts were applied to visualize the feature importance of the best model. ResultsAge, National Institutes of Health Stroke Scale (NIHSS) score, collateral circulation grading, fasting plasma glucose (FPG), lymphocyte percentage (LYMPH%), and homocysteine (Hcy) were independent risk factors for poor neurological rehabilitation prognosis (P < 0.05). For the XGBoost model, the AUC of the training and validation sets were 0.963 (95%CI 0.947 to 0.979) and 0.825 (95%CI 0.764 to 0.885), respectively, while the accuracy was 88.81% and 77.22%, the precision was 92.86% and 68.42%, the recall was 69.47% and 47.27%, and the F1-score was 79.48% and 55.91%, optimal in both calibration and clinical net benefit. The feature importance ranking for the XGBoost model from high to low was NIHSS score, age, collateral circulation grading, FPG, Hcy and LYMPH%. ConclusionThe interpretable XGBoost ML model exhibits excellent predictive efficacy and favorable clinical applicability in predicting neurological rehabilitation outcomes for stroke patients.