Development and clinical application of a machine learning-driven model for metabolite-based diagnosis of small cell lung cancer
10.3969/j.issn.1674-8115.2025.08.008
- VernacularTitle:基于机器学习的小细胞肺癌代谢分子诊断模型的建立和临床应用
- Author:
Xin HUANG
1
;
Jiahui LIU
1
;
Jingwen YE
1
;
Wenli QIAN
1
;
Wanxing XU
1
;
Lin WANG
1
Author Information
1. 上海交通大学医学院附属第一人民医院检验医学中心,上海 200080;上海交通大学医学院医学技术学院,上海 200025
- Publication Type:Journal Article
- Keywords:
small cell lung cancer(SCLC);
diagnosis model;
machine learning;
metabolomics;
liquid chromatography-tandem mass spectrometry(LC-MS/MS)
- From:
Journal of Shanghai Jiaotong University(Medical Science)
2025;45(8):1009-1016
- CountryChina
- Language:Chinese
-
Abstract:
Objective·To develop an early diagnostic model for small cell lung cancer(SCLC)based on differences in serum metabolite expression profiles between patients with SCLC and those with benign pulmonary diseases,using machine learning algorithms.Methods·Serum samples were collected from 29 SCLC patients and 67 patients with benign lung diseases at Shanghai General Hospital,Shanghai Jiao Tong University School of Medicine,as the training cohort.An independent external validation cohort included 20 SCLC patients and 40 patients with benign lung diseases from Gansu Provincial Cancer Hospital.A total of 69 serum metabolites were quantitatively analyzed using liquid chromatography-tandem mass spectrometry(LC-MS/MS).The XGBoost Classifier was employed to rank metabolite importance,and a forward feature selection strategy based on XGBoost was used to identify a subset of key metabolites.Diagnostic models were constructed using AdaBoost,random forest(RF),and light gradient boosting machine(LGBM)algorithms.Model performance was assessed using receiver operating characteristic(ROC)curves and the area under the curve(AUC),and validated on the external test cohort.Results·Principal component analysis(PCA)and orthogonal projections to latent structures-discriminant analysis(OPLS-DA)of the training cohort revealed distinct metabolic profiles between SCLC and benign lung disease patients.Based on feature importance rankings,six key metabolites were selected to construct the MTB-6 diagnostic model.Among the models,AdaBoost achieved the best performance,with an AUC of 0.943,sensitivity of 75.0%,and specificity of 90.9%in the training cohort.In the external test cohort,the model demonstrated robust performance with an AUC of 0.921,sensitivity of 80.0%,and specificity of 87.5%.Conclusion·The MTB-6 model,based on six serum metabolites and the AdaBoost algorithm,exhibits excellent diagnostic performance and holds potential for the differential diagnosis of SCLC and benign pulmonary diseases.