A study on the risk prediction model for cryptogenic stroke in patients with right-to-left shunt
10.3760/cma.j.cn371468-20231202-00281
- VernacularTitle:伴右向左分流隐源性卒中患者发病风险预测模型研究
- Author:
Sujuan TANG
1
;
Qingwen WU
;
Linger LI
;
Daojing LI
;
Hongqin ZHAO
Author Information
1. 青岛大学附属医院神经内科,青岛 266035
- Keywords:
Cryptogenic stroke;
Right-to-left shunt;
Machine learning;
Predictive model;
Random forest model
- From:
Chinese Journal of Behavioral Medicine and Brain Science
2024;33(6):505-512
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To predict the risk of cryptogenic stroke (CS) patients with right-to-left shunt (RLS) by machine learning, and provide potential solutions for accurate and efficient prediction of CS.Methods:A retrospective analysis of clinical data on 289 subjects with positive RLS detected by contrast-enhanced transcranial Doppler tests (c-TCD) treated in the Department of Neurology at Laoshan Campus, the Affiliated Hospital of Qingdao University, from January 2018 to September 2023, including demographic information, medical history, laboratory test indicators, diagnosis, and treatment.The dataset was randomly divided into a training set and a testing set by the machine learning function train_test_split(), with a ratio of 8∶2.Risk prediction models for CS in RLS subjects were constructed by algorithms such as Logistic regression, decision trees, random forests, extreme gradient boosting, artificial neural networks, gradient boosting, extra trees, and adaptive Boosting.The model performance was evaluated by receiver operating characteristic curves (ROC), area under curve (AUC), confusion matrix, precision, recall, accuracy, F1 score, calibration curves, and decision curve analysis.The optimal model was subjected to interpretability analysis by feature importance and SHAP values.The t-test, Mann-Whitney U test and χ2 test were used for data analysis by SPSS 25.0 software.Delong test was used to compare the differences in AUC between the two models. Results:In 289 RLS subjects, there were 166 cases of CS (57.5%) and 123 cases of non-CS (42.5%).The statistical analysis results showed that blood biochemical indicators such as D-dimer, mean platelet volume, and fibrinogen in CS patients were higher than those in non-CS patients (all P<0.01).There were no statistically significant differences in variables between the training and testing sets(all P>0.05).Random forest model achieved the highest AUC (0.885), precision (0.806), recall (0.879), accuracy (0.810), and F1 score (0.841) for CS risk prediction in the testing set.The calibration curve showed that the random forest model was closest to the reference line, and the decision curve analysis indicated that it had a greater net benefit.The interpretability analysis revealed that high-risk factors included mean platelet volume, D-dimer, international normalized ratio, body mass index, and age. Conclusion:The random forest-based prediction tool exhibits excellent performance, demonstrating high accuracy in predicting CS risk in RLS population.