Predictive model for severe adverse reaction associated with bevacizumab based on the global trigger tool and machine learning
- VernacularTitle:基于全面触发工具与机器学习的贝伐珠单抗严重不良反应预测模型研究
- Author:
Yongfei FU
1
;
Xin LONG
1
;
Hongzhen XU
2
;
Jian TANG
2
;
Xiangqing LI
3
;
Yucheng LONG
2
;
Dong QIN
4
Author Information
1. College of Pharmacy,Guilin Medical University,Guangxi Guilin 541199,China;Dept. of Pharmacy,Guilin People’s Hospital,Guangxi Guilin 541002,China
2. Dept. of Pharmacy,Guilin People’s Hospital,Guangxi Guilin 541002,China
3. Data Center,Guilin People’s Hospital,Guangxi Guilin 541002,China
4. College of Pharmacy,Guilin Medical University,Guangxi Guilin 541199,China
- Publication Type:Journal Article
- Keywords:
bevacizumab;
adverse drug reaction;
global trigger tool;
machine learning;
predictive model
- From:
China Pharmacy
2026;37(4):497-503
- CountryChina
- Language:Chinese
-
Abstract:
OBJECTIVE To confirm trigger items for adverse drug reaction (ADR) induced by bevacizumab, to identify and analyze the occurrence of related ADR, and to establish a predictive model for severe adverse reaction (SAR) caused by this drug. METHODS Based on the global trigger tool (GTT) theory, and referencing the GTT White Paper, drug package inserts and relevant literature, trigger items for bevacizumab-related ADR were confirmed using a single-round Delphi method. Utilizing these established items, electronic medical records of relevant patients at Guilin People’s Hospital from January 2020 to September 2024 were actively screened via the China Hospital Pharmacovigilance System. Pharmacists then identified and tallied the occurrence of bevacizumab-induced ADR. Data from patients with any positive trigger item served as the study subjects (divided into training and test sets at a ratio of 7∶3), candidate feature variables were selected from 39 related variables using the Boruta algorithm, and the multivariable Logistic regression analysis was performed with the occurrence of SAR as the dependent variable. Based on these candidate features, Logistic Regression, Extreme Gradient Boosting, Light Gradient Boosting Machine, Random Forest, and Categorical Boosting models were constructed. Model performance was evaluated using metrics including the area under the curve (AUC) of receiver operating characteristic curve and recall rate. The Shapley Additive exPlanations (SHAP) method was applied to analyze and interpret the contribution of each variable. A nomogram was constructed based on the optimal model. RESULTS A total of 38 trigger items for active monitoring of bevacizumab-related ADR were determined, comprising 17 laboratory indicators, 13 clinical manifestations, and 8 intervention measures. In total, 483 patients with positive trigger items were included, and 318 patients with bevacizumab-induced ADR were identified, including 83 SARs. The positive predictive values for the trigger items and cases were 43.57% (708/1 625) and 63.84% (318/483), respectively. Bevacizumab-induced ADR involved 7 systems/organs, with the hematological system being the most frequently involved (64.15%). The Boruta algorithm selected 7 vari ables: serum potassium, hematocrit, albumin-to-globulin ratio, prealbumin, hypertension history, age and red blood cell count. Multivariable Logistic regression showed that elevated serum potassium levels were associated with a decreased risk of bevacizumab-induced SAR (OR=0.234, P =0.002), while a history of hypertension (OR=2.642, P =0.006) and increased age (OR=1.040, P =0.025) were associated with an increased risk. The Logistic Regression model demonstrated superior performance with higher AUC, F1 score and recall rate (0.761, 0.447, 0.607), compared to other models. SHAP evaluation results indicated that variables such as serum potassium, hematocrit, and age ranked highest in importance. CONCLUSIONS Totally 38 trigger entries have been successfully identified for active screening of bevacizumab-related ADR. Elevated serum potassium levels are a protective factor against bevacizumab-induced SAR, whereas the hypertension history and increased age are risk factors. The Logistic Regression model is the optimal predictive model.