Prediction of pathological type of early lung adenocarcinoma using machine learning based on SHOX2 and RASSF1A methylation levels
- VernacularTitle:基于SHOX2和RASSF1A甲基化水平的机器学习算法预测早期肺腺癌病理类型
- Author:
Runqi HUANG
1
;
Guangliang QIANG
2
;
Yifei LIU
3
;
Jiahai SHI
4
Author Information
1. Research Center of Clinical Medicine, Affiliated Hospital of Nantong University, Nantong, 226001, Jiangsu, P. R. China
2. Department of Thoracic Surgery, Peking University Third Hospital, Beijing, 100191, P. R. China
3. Department of Pathology, Affiliated Hospital of Nantong University, Nantong, 226001, Jiangsu, P. R. China
4. Department of Cardiothoracic Surgery, Affiliated Hospital of Nantong University, Nantong, 226001, Jiangsu, P. R. China
- Publication Type:Journal Article
- Keywords:
Lung adenocarcinoma;
SHOX2;
RASSF1A;
methylation;
invasiveness
- From:
Chinese Journal of Clinical Thoracic and Cardiovascular Surgery
2025;32(01):67-72
- CountryChina
- Language:Chinese
-
Abstract:
Objective To explore the accuracy of machine learning algorithms based on SHOX2 and RASSF1A methylation levels in predicting early-stage lung adenocarcinoma pathological types. Methods A retrospective analysis was conducted on formalin-fixed paraffin-embedded (FFPE) specimens from patients who underwent lung tumor resection surgery at Affiliated Hospital of Nantong University from January 2021 to January 2023. Based on the pathological classification of the tumors, patients were divided into three groups: a benign tumor/adenocarcinoma in situ (BT/AIS) group, a minimally invasive adenocarcinoma (MIA) group, and an invasive adenocarcinoma (IA) group. The methylation levels of SHOX2 and RASSF1A in FFPE specimens were measured using the LungMe kit through methylation-specific PCR (MS-PCR). Using the methylation levels of SHOX2 and RASSF1A as predictive variables, various machine learning algorithms (including logistic regression, XGBoost, random forest, and naive Bayes) were employed to predict different lung adenocarcinoma pathological types. Results A total of 272 patients were included. The average ages of patients in the BT/AIS, MIA, and IA groups were 57.97, 61.31, and 63.84 years, respectively. The proportions of female patients were 55.38%, 61.11%, and 61.36%, respectively. In the early-stage lung adenocarcinoma prediction model established based on SHOX2 and RASSF1A methylation levels, the random forest and XGBoost models performed well in predicting each pathological type. The C-statistics of the random forest model for the BT/AIS, MIA, and IA groups were 0.71, 0.72, and 0.78, respectively. The C-statistics of the XGBoost model for the BT/AIS, MIA, and IA groups were 0.70, 0.75, and 0.77, respectively. The naive Bayes model only showed robust performance in the IA group, with a C-statistic of 0.73, indicating some predictive ability. The logistic regression model performed the worst among all groups, showing no predictive ability for any group. Through decision curve analysis, the random forest model demonstrated higher net benefit in predicting BT/AIS and MIA pathological types, indicating its potential value in clinical application. Conclusion Machine learning algorithms based on SHOX2 and RASSF1A methylation levels have high accuracy in predicting early-stage lung adenocarcinoma pathological types.