Development and validation of a machine-learning model based on routine laboratory parameters for preoperative prediction of microvascular invasion in patients with hepatocellular carcinoma

Zhou YU; Lijin LIN; Yazhi CHEN; Tiansheng LIN; Qishui OU; Jinlan HUANG

Return

Development and validation of a machine-learning model based on routine laboratory parameters for preoperative prediction of microvascular invasion in patients with hepatocellular carcinoma

VernacularTitle:基于检验数据和机器学习构建并验证术前肝细胞癌微血管浸润的预测模型
Author: Zhou YU ¹ ; Lijin LIN ; Yazhi CHEN ; Tiansheng LIN ; Qishui OU ; Jinlan HUANG
Author Information

1. 福建医科大学附属第一医院检验科　福建省检验医学重点实验室　福建医科大学基因诊断研究中心　福建省临床免疫学检验临床医学研究中心，福州　350005
Publication Type:Journal Article
Keywords: Liver neoplasms; Hepatocellular carcinoma; Microvascular invasion; Prediction model; Xgboost; Machine learning
From: Chinese Journal of Laboratory Medicine 2025;48(1):65-75
CountryChina
Language:Chinese
Abstract: Objective:To develop and validate a machine learning (ML) noninvasive model based on routine laboratory parameters to preoperatively predict the microvascular invasion (MVI) in patients with hepatocellular carcinoma (HCC).Methods:A total of 629 HCC patients who underwent hepatectomy at the First Affiliated Hospital of Fujian Medical University between January 2019 and December 2023 were retrospectively enrolled in this study and were divided chronologically into a training set ( n=464) and internal validation set ( n=165). A cohort with 190 HCC patients from Fujian Provincial Hospital were used as an external validation set. Preoperatively demographic features, tumor size and routine laboratory data were collected. All patients were divided into MVI-positive or MVI-negative group. The Boruta algorithm and LASSO regression algorithm were used to screen out related features in the training set. Eight different ML algorithms including multivariate logistic regression, decision tree (DT), random forest (RF), extreme gradient boosting (XGboost), k-nearest neighbor (KNN), support vector machine (SVM), light gradient boosting machine (LGBM) and Naive Bayes were used to construct the prediction models. The predictive performances of these models on training and internal validation sets were evaluated by the receiver operating characteristic (ROC) curve with the area under the curve (AUC). The ML model with the highest AUC values was defined as the optimal model and its performance was further validated in the external validation set. The calibration curve showed that the probability value curve was close to the actual occurrence probability curve, and the DCA showed that it could be applied within the threshold probability range of 0.3-0.8 to obtain net benefits. Results:After screening, eight parameters including α-fetoprotein (AFP), protein induced by vitamin K absence Ⅱ (PIVKA-Ⅱ), tumor size, eosinophil count, neutrophil count, creatinine, ApoA1 and total bilirubin were finally selected for the construction of the preoperative prediction model for MVI in HCC. Among all the tested eight ML algorithms, XGboost obtained the optimal performance with an AUC of 0.820 in training set, an AUC of 0.803 in internal testing set and an AUC of 0.758 in external testing set. Further stratified analysis showed that the AUC for preoperatively predicting MVI by XGboost was 0.817 for HCC patients with positive hepatitis B surface antigen, 0.779 for male patients and 0.790 for elder patients. The calibration curves showed good agreement between observed and predicted values and the decision curve analysis curve showed relatively higher net benefits.Conclusions:We successfully established and verified a novel XGboost model based on eight routine laboratory parameters with relatively high and reliable predictive accuracy to preoperatively predict MVI in HCC.