Construction of machine learning classification prediction model for vancomycin blood concentrations based on MIMIC-Ⅳ database

Xiaohui LIN; Yujia WANG; Lingling ZHANG; Shuanglin XU

Return

Construction of machine learning classification prediction model for vancomycin blood concentrations based on MIMIC-Ⅳ database

VernacularTitle:基于MIMIC-Ⅳ数据库的万古霉素血药谷浓度机器学习分类预测模型构建
Author: Xiaohui LIN ¹ ; Yujia WANG ¹ ; Lingling ZHANG ¹ ; Shuanglin XU ¹
Author Information

1. Dept. of Clinical Pharmacy，Ningde Municipal Hospital Affiliated to Ningde Normal University，Fujian Ningde 352100，China
Publication Type:Journal Article
Keywords: machine learning; vancomycin; blood concentration; MIMIC-Ⅳ database; classification prediction
From: China Pharmacy 2025;36(19):2448-2453
CountryChina
Language:Chinese
Abstract: OBJECTIVE To construct a classification prediction model for vancomycin blood concentration， and to optimize its precision dosing strategies. METHODS Patient records meeting inclusion criteria were extracted from the Medical Information Mart for Intensive Care database. Following data cleaning and preprocessing， a final cohort of 9 902 patient was analyzed. Feature selection was performed through correlation analysis and the Boruta feature selection algorithm. Vancomycin blood concentrations were discretized into three categories based on clinical therapeutic windows： low （＜10 μg/mL）， intermediate （10-20 μg/mL）， and high （≥20 μg/mL）. Six machine learning algorithms were employed to construct classification models： tabular prior-data fitted network （TabPFN）， logistic regression （LR）， random forest （RF）， extreme gradient boosting （XGBoost）， support vector machine （SVM）， K-nearest neighbors （KNN）. Model performance was evaluated using 10-fold cross-validation （10-CV）， with primary metrics including： accuracy， balanced accuracy， precision macro， recall macro， macro F1， area under the receiver operating characteristic curve （OvR-AUC）. Shapley Additive Explanations （SHAP） was adopted to analyze the direction and magnitude of the impact that different features had on the model’s predictive outcomes. RESULTS The results showed that the RF and TabPFN models performed the best （with accuracy of 0.741 4 and 0.737 7， and OvR-AUC of 0.907 0 and 0.895 8， respectively）. XGBoost model exhibited moderate performance， while LR， SVM， and KNN models demonstrated relatively poor performance. Confusion matrix heatmap analysis revealed that both RF and TabPFN achieved higher accuracy in predicting high- concentration cases but exhibited slightly lower performance in the low and medium concentration categories. Bootstrap with 10-CV revealed that the RF model demonstrated stable performance across various evaluation metrics （accuracy： 0.741 4； balanced accuracy： 0.740 3； precision macro： 0.732 1； recall macro： 0.736 0； macro F1： 0.736 0； OvR-AUC： 0.907 0）， indicating good classification performance and generalization ability. SHAP analysis revealed that creatinine， urea nitrogen， daily cumulative dose and administration frequency of vancomycin， which were key predictors， had a significant impact on the prediction results. CONCLUSIONS RF and TabPFN models demonstrate certain advantages in the classification prediction of vancomycin trough blood concentrations； however， their performance in the low to moderate concentration categories still requires improvement.