Development of a lung cancer prediction model based on peripheral blood indicators using machine learning algorithms
10.3760/cma.j.cn114452-20250919-00523
- VernacularTitle:基于机器学习算法的外周血指标肺癌预测模型的建立
- Author:
Qiangqiang JIN
1
;
Yanling LIU
1
;
Xinyu ZHANG
1
;
Haiting MAO
1
Author Information
1. 山东大学齐鲁第二医院检验医学中心,济南 250033
- Publication Type:Journal Article
- Keywords:
Lung neoplasms;
Machine learning;
Predictive model
- From:
Chinese Journal of Laboratory Medicine
2025;48(12):1528-1534
- CountryChina
- Language:Chinese
-
Abstract:
Objective:By analyzing peripheral blood indicators, we constructed and validated a novel lung cancer prediction model using machine learning algorithms for riskassessment of lung cancer.Methods:A retrospective case-control design was conducted on the clinical data of 194 newly diagnosed lung cancer patients [mean age: (66.80±9.09) years, 126 males and 68 females] admitted to Qilu Second Hospital of Shandong University between January 9, 2020, and December 31, 2024, serving as the case group. During the same period, 290 healthy individuals undergoing physical examinations [mean age: (61.18±14.31) years, 155 males and 135 females J were enrolled as the control group. A total of 46 peripheral blood indicators-including routine blood tests, coagulation parameters, liver function markers, and tumor-related indices-along with two basic characteristics (age and sex) were included in the analysis. Eleven machinelearning algorithms including logistic regression, randomforest, support vector classifier, extreme gradient boosting, gradient boosting decision tree, decision tree, multilayer perceptron, linear discriminant analysis, adaptive boosting, Gaussian naive Bayes and light gradient-boosting machine-were trained for early diagnosis of lung-cancer.Model performance was evaluated by the area under the ROC, accuracy, positive predictive value, negative predictive value, F1-score and 95% confidence interval (95% CI). The best performing algorithm was selected, and feature importance was ranked with Shapley Additive Planation(SHAP) values. Results:The support-vector classifier achieved the best performance for predicting lung-cancer risk (AUC=0.974; 95 % CI 0.951-0.989) and was retained for final model establishment. After 20 rounds of stratified 10-fold cross-validation the mean AUC was 0.950; learning-curve, decision-curve and calibration analyses confirmed its superior generalizability, clinical utility and calibration.SHAPley additive explanations and decision-tree feature importance consistently identified neuron-specific enolase, carcinoembryonic antigen, and squamous-cell carcinoma antigen as the three most critical predictors of lung-cancer risk. Conclusion:An SVM-based lung cancer prediction model was successfully established to determine the risk of developing lung cancer.