Development and validation of a machine learning-based prognostic model for portal vein thrombosis in liver cirrhosis
10.3760/cma.j.cn113884-20250323-00094
- VernacularTitle:基于机器学习的肝硬化合并门静脉血栓患者预后预测模型的构建与评估
- Author:
Junqi YUAN
1
;
Sa LYU
;
Jun LING
;
Yiwen XU
;
Hui FENG
;
Shaoli YOU
;
Fuquan LIU
;
Limei YU
;
Bing ZHU
Author Information
1. 遵义医科大学附属医院贵州省细胞工程重点实验室,遵义 563003
- Publication Type:Journal Article
- Keywords:
Liver cirrhosis;
Portal vein thrombosis;
Machine learning;
Prognostic models
- From:
Chinese Journal of Hepatobiliary Surgery
2025;31(7):497-502
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To analyze the prognostic factors of patients with liver cirrhosis and portal vein thrombosis (PVT), and to construct a prognostic prediction model based on machine learning methods.Methods:The clinical data of 388 patients with liver cirrhosis and PVT admitted to the Fifth Medical Center of PLA General Hospital from January 2022 to April 2024 were retrospectively collected and analyzed, including 243 males and 145 females, aged (56.9±10.9) years. A total of 388 patients were randomly divided into the training set ( n=310) and the testing set ( n=78) in a 4∶1 ratio. The Boruta algorithm was used to screen the key features in the training set, and then four machine learning algorithms, including random forest, support vector machine, generalized linear model and Bayesian, were used to establish a survival prediction model. Model performance was evaluated by the receiver operating characteristic (ROC) curves of the test set and the training set. The patients were followed up for 1 year for survival. Sort the importance of features based on the SHAP value. Results:There were 250 patients (80.6%) who survived and 60 (19.4%) who died. The model for end-stage liver disease score, total bilirubin, serum creatinine, prothrombin time, international normalized ratio, D-dimer, white blood cell count, severe ascites ratio, and Child-Pugh grade C ratio of liver function in the death group were higher than those in the survival group, and the red blood cell count and hematocrit were lower than those in the survival group, and the differences were statistically significant (all P<0.05). The areas under the ROC curve for predicting survival by random forest, support vector machine, generalized linear model and Bayesian model were 0.92, 0.78, 0.81 and 0.71 in the training set, and the area under the ROC curve in the testing set were 0.81, 0.72, 0.67 and 0.68, respectively. Random forest had the best prediction performance, with an accuracy of 81.7%, a sensitivity of 84.6%, and a specificity of 76.9% in the testing set. In the analysis of the importance of characteristic parameters of the random forest model, total bilirubin, red blood cells, hematocrit, serum creatinine, ascites classification, etc. had a relatively high contribution to the model. Conclusion:In the survival prediction model of patients with liver cirrhosis and PVT based on machine learning algorithm, the random forest model had high prediction performance, and total bilirubin may be the most important factor affecting the survival prognosis of patients.