1.Construction and external validation of a machine learning-based prediction model for epilepsy one year after acute stroke.
Wenkao ZHOU ; Fangli ZHAO ; Xingqiang QIU ; Yujuan YANG ; Tingting WANG ; Lingyan HUANG
Chinese Critical Care Medicine 2025;37(5):445-451
OBJECTIVE:
To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.
METHODS:
A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.
RESULTS:
Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.
CONCLUSIONS
The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.
Humans
;
Machine Learning
;
Stroke/complications*
;
Nomograms
;
Epilepsy/etiology*
;
Algorithms
;
Male
;
Female
;
Logistic Models
;
Middle Aged
;
Aged
;
Risk Factors
;
Bayes Theorem
2.Comparison of the efficacy and construction of prediction model for relapse free survival in breast cancer based on diabetes mellitus type 2
Wenkao ZHOU ; Hesen HUANG ; Yimei PAN ; Lingyan HUANG ; Mingshan WANG ; Fangli ZHAO ; Ya WANG ; Huimin TANG
Journal of International Oncology 2025;52(5):295-303
Objective:To construct univariate and multivariate relapse free survival (RFS) prediction models for breast cancer patients with diabetes mellitus type 2 (T2DM) and to compare and select the model with higher predictive performance.Methods:A total of 912 breast cancer patients treated at the First Affiliated Hospital of Dalian Medical University from January 2010 to December 2016 were included, of which 202 patients had T2DM and 710 patients did not. Kaplan-Meier survival curve was drawn based on whether patients had T2DM, and log-rank test was performed based on whether patients had T2DM. All patients were randomly divided into a training set ( n=640) and a validation set ( n=272) at a ratio of 7∶3. Univariate and multivariate Cox proportional risk regression models were used to analyze RFS in breast cancer patients with the survival package. The "rms" package was employed to construct univariate and multivariate RFS prediction models for breast cancer patients with T2DM. Clinical decision curves and calibration curves were used to validate the models. The receiver operator characteristic (ROC) curve was used to compare and analyze the prediction performance of the two models. Results:There were no statistically significant differences between the training set and the validation set patients in terms of age, T2DM, surgical approach, axillary management methods, T stage, N stage, molecular sub-type, estrogen receptor (ER) 1, ER2, progesterone receptor (PR) , ER and PR consistency, Ki67, human epidermal growth factor receptor 2 (HER2) (all P>0.05) . There was a statistically significant difference in histological grade ( χ2=7.59, P=0.022) . Survival analysis showed that the 5-year RFS rate was 83.7% in patients with T2DM and 92.3% in patients without T2DM ( χ2=16.61, P<0.001) . Univariate analysis revealed that age ( HR=1.04, 95% CI: 1.03-1.06, P<0.001) , T2DM ( HR=2.31, 95% CI: 1.49-3.55, P<0.001) , surgical approach ( HR=2.39, 95% CI: 1.20-4.77, P=0.013) , axillary management methods ( HR=2.62, 95% CI: 1.72-3.98, P<0.001) , T stage (T 2: HR=2.13, 95% CI: 1.36-3.31, P<0.001; T 3: HR=6.90, 95% CI: 3.35-14.22, P<0.001) , N stage (N 2: HR=3.87, 95% CI: 2.12-7.07, P<0.001; N 3: HR=8.61, 95% CI: 4.71-15.75, P<0.001) , molecular sub-type (Luminal B: HR=2.74, 95% CI: 1.17-6.36, P=0.019; HER2 +: HR=3.64, 95% CI: 1.38-9.58, P=0.009; TNBC: HR=4.40, 95% CI: 1.71-11.34, P=0.002) , ER1 (>10%: HR=0.57, 95% CI: 0.37-0.90, P=0.016) , ER2 ( HR=0.57, 95% CI: 0.37-0.89, P=0.015) , and PR ( HR=0.56, 95% CI: 0.37-0.86, P=0.008) were all factors influencing RFS in breast cancer patients. Multivariate analysis demonstrated that age ( HR=1.04, 95% CI: 1.02-1.06, P<0.001) , T2DM ( HR=1.82, 95% CI: 1.16-2.85, P=0.009) , T stage (T 2: HR=1.60, 95% CI: 1.01-2.54, P=0.046; T 3: HR=2.64, 95% CI: 1.22-5.72, P=0.014) , N stage (N 2: HR=3.72, 95% CI: 2.01-6.88, P<0.001; N 3: HR=5.34, 95% CI: 2.78-10.25, P<0.001) , and ER1 (>10%: HR=0.63, 95% CI: 0.39-0.99, P=0.046) were independent factors influencing RFS in breast cancer patients. Based on the 10 and 5 variables with P<0.05 in the univariate and multivariate analyses respectively, the nomograms of the univariate and multivariate prediction models were constructed to evaluate the influence of factors such as T2DM on the postoperative RFS of breast cancer patients. Clinical decision curves and calibration curves indicated that both models had high predictive value for RFS in breast cancer patients, and the predictive results were highly consistent with the actual observed results. ROC curve analysis showed that there was no statistically significant difference in the area under the curve (AUC) of the two models for predicting the RFS rates of breast cancer patients in the training set and validation set at 36, 60, and 84 months (all P>0.05) , indicating that the predictive efficacy of the two models was comparable. The multivariate model is more suitable for clinical application because it uses fewer variables. Conclusions:Breast cancer patients with T2DM have poorer prognosis. Age, T2DM, T stage, N stage, and ER1 are independent factors influencing postoperative RFS in breast cancer patients. The multi-factor prediction model of RFS in breast cancer patients based on T2DM is more suitable for clinical application due to its higher predictive efficacy and fewer variables.
3.Dynamic changes and time-dependent analysis of mortality risk factors in severe pneumonia patients
Wenkao ZHOU ; Lide SU ; Lingyan HUANG ; Ailin GUO ; Yimei PAN ; Zonghong LIU ; Yaben YAO
Chinese Journal of Emergency Medicine 2025;34(8):1071-1077
Objective:To analyze mortality risk factors in patients with severe pneumonia and investigate their varying influences across different time periods.Methods:A total of 134 patients with severe pneumonia admitted to the Emergency Department of Xiang’an Hospital, Xiamen University, between June 2019 and February 2020 were enrolled. All patients were treated in the EICU and followed up for four years. Based on outcomes, they were categorized into a death group ( n=77) and a survival group ( n=57). COX regression analysis was employed to identify mortality risk factors at different time points, while logistic regression analysis was used to assess risk factors influencing mortality during hospitalization, ICU stay, 1-month, and 1-year follow-up periods. Results:Mortality rates were 11.9% ( n=16) during ICU admission, 20.8% ( n=28) during hospitalization, 16.4% ( n=22) within 1 month, and 31.3% ( n=42) within 1 year. By the end of the follow-up, 57.4% ( n=77) of patients had died. Ten mortality risk factors were identified, with the number increasing over time. During ICU admission and hospitalization, significant risk factors included total bilirubin levels, APACHE-II score, invasive ventilation, ARDS, and vasopressor use in the ICU. One-month mortality risk additionally involved bacterial infection. One-year mortality risk further incorporated advanced age and chronic heart failure. By the end of follow-up, acute kidney injury (AKI) during ICU admission also emerged as a contributing factor, while higher body weight was identified as a protective factor. Conclusions:The number of mortality risk factors in severe pneumonia patients increases progressively over time. Early-stage factors during hospitalization and ICU admission exert a stronger impact on short-term mortality, whereas bacterial infection, advanced age, and chronic heart failure become increasingly significant in later stages. These findings highlight the dynamic nature of risk factors and underscore the importance of tailored monitoring and intervention strategies at different disease phases.
4.An evidence-based predictive model for early recurrence risk after hepatocellular carcinoma surgery and external validation study
Wenkao ZHOU ; Fangli ZHAO ; Jiajia CHEN ; Lei CHEN ; Lingyan HUANG ; Yue WANG ; Huimin TANG
Cancer Research and Clinic 2024;36(11):835-842
Objective:To construct an evidence-based prediction model for early recurrence after surgery of hepatocellular carcinoma (HCC) based on Meta-analysis and to do external validation study.Methods:The literatures in Chinese National Knowledge Infrastructure, Wanfang, VIP, Chinese Science Citation Database (CSCD), Chinese Social Science Citation System (CCSCI), PubMed, Web of Science and IEEE databases between January 2019 and December 2023 were searched based on the subject words. According to the inclusion and exclusion criteria, 9 literatures were included to screen the risk factors affecting the early recurrence of HCC. When the same risk factor was found in ≥5 included literatures, Meta-analysis was performed by using Review Manager 5.4.1 software. External validation data were collected from 401 patients with primary HCC who underwent surgery in Liaoning Cancer Hospital between March 2014 and March 2017. The patients were divided into early recurrence group (176 cases) and early non-recurrence group (225 cases) according to whether they relapsed 2 years after surgery. The OR values of all risk factors obtained in the Meta-analysis were converted into modeling, and postoperative early recurrence rate of HCC in the Meta-analysis was used to calculate β 0, and finally the logistic model was obtained. The OR value was incorporated into the logit (P) model, and the morbidity (P) of the external validation data was calculated. Taking the recurrence 2 years after surgery or not as the dependent variable and P as the independent variable, the receiver operating characteristic (ROC) curve was drawn to calculate the area under the curve (AUC). Results:A total of 8 risk factors for early HCC recurrence were screened out from 9 literatures (x 1: alpha-fetoprotein ≥ 400 ng/ml; x 2: tumor number ≥ 2; x 3: the longest tumor diameter ≥ 5 cm; x 4: Barcelona staging B-C; x 5: microvascular invasion; x 6: moderate to low differentiation; x 7: incomplete capsule; x 8: nonanatomic hepatectomy). The Meta-analysis included 1 757 HCC cases, with 960 postoperative early recurrences and an early recurrence rate of 45.36%, finally the β 0 value was -0.201. The predictive model for 2-year recurrence of HCC was constructed and calculated as logit (P) = -0.201+0.835x 1+0.905x 2+0.783x 3+1.008x 4+0.765x 5+0.831x 6+1.533x 7+0.940x 8. Analysis of variance by external validation data showed that the differences in ascites, alpha-fetoprotein, tumor number, tumor diameter, Barcelona staging, microvascular invasion, tumor differentiation degree, capsule invasion, resection type, and systemic inflammation index were statistically significant between early recurrence group and early non-recurrence group (all P < 0.05). ROC curve analysis showed that AUC of postoperative early recurrence of HCC predicted by the model was 0.718, (95% CI: 0.689-0.753), the optimal cut-off value was 3.11, the Yoden index was 0.288, the sensitivity was 69.32%, and the specificity was 69.56%. Conclusions:The evidence-based prediction model constructed based on Meta-analysis for postoperative early recurrence of HCC has a high predictive value. However, further verification and optimization with big data is still needed.

Result Analysis
Print
Save
E-mail