1.Study on the application of classification tree model in screening the risk factors of ischemic stroke
Shuang YAO ; Hao LI ; Kaixiang LIU ; Guangpeng LENG ; Jian YU
Chinese Critical Care Medicine 2018;30(10):973-977
Objective To construct a prediction model for the risk of ischemic stroke (IS) by classification tree model, and evaluate its application value. Methods By cluster sampling, 858 IS patients with perfect clinical data from January to December 2017 in the Affiliated Hospital of Guilin Medical College (IS group) were enrolled, and 844 health checkups matched with the gender and age of IS patients in the same period were enrolled as controls (healthy control group). The metabolic characteristics of the two groups were compared and analyzed. The classification tree model was used to construct the prediction model of the risk of IS, and the gain diagram, index chart, risk value of misclassification probability and receiver operating characteristic curve (ROC) were used to evaluate the application value of the model. Results Compared with the healthy control group, body mass index (BMI), fasting blood glucose (FPG), triglyceride (TG), total cholesterol (TC), low density lipoprotein cholesterol (LDL-C) in IS group were significantly increased [BMI (kg/m2): 25.34±3.70 vs. 24.24±3.10, FPG (mmol/L): 6.79±2.89 vs. 5.73±1.17, TG (mmol/L):1.62±1.06 vs. 1.44±1.06, TC (mmol/L): 4.70±2.73 vs. 4.35±0.79, LDL-C (mmol/L): 3.18±0.94 vs. 2.73±0.73, all P < 0.01], high density lipoprotein cholesterol (HDL-C) was significantly decreased (mmol/L: 1.12±0.33 vs. 1.35±0.36, P < 0.01), and the proportion of hypertension, smoking and drinking were significantly increased (69.0% vs. 41.9%, 23.1% vs. 16.8%, 19.2% vs. 13.4%, all P < 0.01). By assigning values to each factor [IS: No = 0,Yes = 1; BMI: < 24.0 kg/m2=0, ≥ 24.0 kg/m2= 1; FPG : < 7.0 mmol/L = 0, ≥7.0 mmol/L = 1; TG: < 2.26 mmol/L = 0, ≥2.26 mmol/L = 1; TC: <6.22 mmol/L = 0, ≥6.22 mmol/L = 1; LDL-C: < 4.14 mmol/L = 0, ≥4.14 mmol/L = 1; HDL-C: < 1.04 mmol/L = 0, ≥1.04 mmol/L = 1; hypertension: No = 0,Yes = 1; smoking: No = 0,Yes = 1; drinking: No = 0,Yes = 1], a classification tree model was established to analyze the risk factors of IS. The classification tree model consisted of 4 layers and 17 nodes: the first layer was hypertension, the second layer was FPG and HDL-C, the third layer was HDL-C and FPG, and the fourth layer was LDL-C and smoking. There were five explanatory variables screened out in the model, including hypertension, FPG, HDL-C, LDL-C and smoking. The first layer of the tree showed that the incidence of IS in hypertensive population (62.6%) was significantly higher than that in non-hypertensive population (35.2%). The second layer of the tree showed that the incidence of IS in people with hypertension with HDL-C≥1.04 mmol/L (53.6%) was lower than that in people with HDL-C < 1.04 mmol/L (78.5%). However, in the population without hypertension, the probability of IS occurrence in the population with FPG ≥ 7.0 mmol/L (71.1%) was significantly higher than that in the population with FPG < 7.0 mmol/L (28.3%). The third layer of the tree showed that the IS incidence of HDL-C ≥1.04 mmol/L (21.8%) was lower than that of HDL-C < 1.04 mmol/L (48.7%) in the population without hypertension and FPG < 7.0 mmol/L. However, in the population with hypertension and HDL-C ≥ 1.04 mmol/L, the probability of IS occurrence in the population with FPG ≥ 7.0 mmol/L (78.6%) was significantly higher than that in the population with FPG < 7.0 mmol/L (46.7%). The fourth layer of the tree showed that the IS incidence of people with LDL-C ≥4.14 mmol/L (53.8%) was higher than that of people with LDL-C < 4.14 mmol/L (19.0%) in the population without hypertension, FPG < 7.0 mmol/L and HDL-C ≥ 1.04 mmol/L. In the population without hypertension, the incidence of IS in smokers (76.9%) was higher than that in non-smokers (39.1%) of people with FPG < 7.0 mmol/L and HDL-C <1.04 mmol/L. In the population with hypertension, the probability of IS occurrence in the population with LDL-C ≥4.14 mmol/L (72.5%) was higher than that in the population with LDL-C < 4.14 mmol/L (44.4 %) of people with HDL-C ≥ 1.04 mmol/L and FPG < 7.0 mmol/L. The gain diagram of IS classification tree model shown that the gain value increased rapidly from 0% to 100% and then tended to be stable. The index chart shown that the index value kept stable in the moving direction from above 100% and then dropped rapidly to 100%, indicating the model was very well. The risk value of misclassification probability of the classification tree model was 0.291, and the correct rate of risk factor for IS patients was 70.90%. The area under ROC curve (AUC) was 78.0% [95% confidence interval (95%CI) =75.9%-79.9%, P < 0.001], the sensitivity was 62.5% (95%CI = 59.1%-65.7%) and the specificity was 79.4% (95%CI =76.5%-82.1%). Conclusion Classification tree model can properly predict the risk factor of IS, and the most important risk factors are hypertension, hyperglycemia, high LDL-C and smoking.