Constructing and validation of a predictive model and application program for stone recurrence after endoscopic retrograde cholangiopancreatography based on machine learning algorithms in patients with common bile duct stones
10.3760/cma.j.cn115455-20241003-00833
- VernacularTitle:基于机器学习算法构建预测胆总管结石患者内镜逆行胰胆管造影术后结石复发的模型和应用程序并验证
- Author:
Jian CHEN
1
;
Kaijian XIA
;
Fuli GAO
;
Yu DING
;
Ganhong WANG
;
Xiaodan XU
Author Information
1. 常熟市第一人民医院消化内科,常熟 215500
- Publication Type:Journal Article
- Keywords:
Cholangiopancreatography, endoscopic retrograde;
Choledocholithiasis;
Recurrence;
Machine learning;
Application
- From:
Chinese Journal of Postgraduates of Medicine
2025;48(5):452-460
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To construct and validate a predictive model and application program for stone recurrence after endoscopic retrograde cholangiopancreatography (ERCP) based on machine learning algorithms in patients with common bile duct stones (CBDS).Methods:A multicenter retrospective cohort study was conducted, 862 CBDS patients underwent ERCP from June 2020 to September 2023 in Changshu First People′s Hospital (data set 1, 759 cases, including a training set of 588 cases and a validation set of 171 cases) and Changshu Hospital of Traditional Chinese Medicine (data set 2, 103 cases, used as a test set). The demographics, medical history, ERCP procedural records and laboratory indices were collected. All patients were followed up for 1 year, and the stone recurrence was recorded. In training set, the feature selection was conducted by the least absolute shrinkage and selection operator (LASSO) algorithm, and a conventional Logistic regression model was constructed based on selected features. The 3 machine learning algorithms (gradient boosting machine model, extreme gradient boosting model and random forest model) and a conventional Logistic regression model (LASSO model) were trained to fit predictive models. The model performance was assessed by area under curve (AUC) of receiver operating characteristic curve. The model interpretability was analyzed by feature importance evaluation, Shapley additive explanations (SHAP) and force plots. The best-performing model was deployed as an online application by Streamlit framework (V1.36.0).Results:Among the 862 patients, 158 patients (18.33%) developed stone recurrence after ERCP. There were no statistical difference in demographics, medical history, ERCP procedural records and laboratory indices between training set and a validation set ( P>0.05). LASSO regression analysis result showed that 6 key variables (in descending order of significance: endoscopic sphincterotomy, common bile duct angulation, stone diameter, stone count, common bile duct diameter, and periampullary diverticulum) influencing stone recurrence. ROC curve analysis result showed that the random forest model exhibited the highest predictive performance (it had the largest AUC of 0.900). SHAP analysis result showed that common bile duct angulation, common bile duct diameter, stone diameter, endoscopic sphincterotomy and stone count were the top 5 contributing factors in the random forest model. Using Python, the random forest model was implemented into a Streamlit-based application with a user-friendly visual interface, providing predictive outcomes, confidence levels, SHAP force diagram and health recommendations. In the test set, the application program achieved an accuracy of 84.5% (87/103), sensitivity of 82.6% (19/23), and specificity of 85.0% (68/80). SHAP plots and force diagram intuitively illustrated the impact of key features on stone recurrence prediction, offering a clear visualization of each variable′s role within the model. Conclusions:The predictive model and application program based on the random forest machine learning algorithms demonstrate excellent predictive performance and practical usability in predicting stone recurrence after ERCP in patients with CBDS.