Development of a lung cancer prediction model based on peripheral blood indicators using machine learning algorithms

Qiangqiang JIN; Yanling LIU; Xinyu ZHANG; Haiting MAO

Return

Development of a lung cancer prediction model based on peripheral blood indicators using machine learning algorithms

VernacularTitle:基于机器学习算法的外周血指标肺癌预测模型的建立
Author: Qiangqiang JIN ¹ ; Yanling LIU ¹ ; Xinyu ZHANG ¹ ; Haiting MAO ¹
Author Information

1. 山东大学齐鲁第二医院检验医学中心，济南　250033
Publication Type:Journal Article
Keywords: Lung neoplasms; Machine learning; Predictive model
From: Chinese Journal of Laboratory Medicine 2025;48(12):1528-1534
CountryChina
Language:Chinese
Abstract: Objective:By analyzing peripheral blood indicators, we constructed and validated a novel lung cancer prediction model using machine learning algorithms for riskassessment of lung cancer.Methods:A retrospective case-control design was conducted on the clinical data of 194 newly diagnosed lung cancer patients [mean age: (66.80±9.09) years, 126 males and 68 females] admitted to Qilu Second Hospital of Shandong University between January 9, 2020, and December 31, 2024, serving as the case group. During the same period, 290 healthy individuals undergoing physical examinations [mean age: (61.18±14.31) years, 155 males and 135 females J were enrolled as the control group. A total of 46 peripheral blood indicators-including routine blood tests, coagulation parameters, liver function markers, and tumor-related indices-along with two basic characteristics (age and sex) were included in the analysis. Eleven machinelearning algorithms including logistic regression, randomforest, support vector classifier, extreme gradient boosting, gradient boosting decision tree, decision tree, multilayer perceptron, linear discriminant analysis, adaptive boosting, Gaussian naive Bayes and light gradient-boosting machine-were trained for early diagnosis of lung-cancer.Model performance was evaluated by the area under the ROC, accuracy, positive predictive value, negative predictive value, F1-score and 95% confidence interval (95% CI). The best performing algorithm was selected, and feature importance was ranked with Shapley Additive Planation(SHAP) values. Results:The support-vector classifier achieved the best performance for predicting lung-cancer risk (AUC=0.974; 95 % CI 0.951-0.989) and was retained for final model establishment. After 20 rounds of stratified 10-fold cross-validation the mean AUC was 0.950; learning-curve, decision-curve and calibration analyses confirmed its superior generalizability, clinical utility and calibration.SHAPley additive explanations and decision-tree feature importance consistently identified neuron-specific enolase, carcinoembryonic antigen, and squamous-cell carcinoma antigen as the three most critical predictors of lung-cancer risk. Conclusion:An SVM-based lung cancer prediction model was successfully established to determine the risk of developing lung cancer.