Development and validation of a diagnostic model based on machine learning algorithms for the development of interstitial lung diseases in patients with rheumatoid arthritis
10.3760/cma.j.cn141217-20230621-00164
- VernacularTitle:基于机器学习算法开发与验证类风湿关节炎患者发生间质性肺疾病的早期诊断模型
- Author:
Yancong NIE
1
;
Yanqing JIN
;
Meilin YIN
;
Xiaoxia WANG
;
Lixia QIU
Author Information
1. 山西医科大学公共卫生学院,太原 030001
- Keywords:
Arthritis, rheumatoid;
Lung diseases, interstitial;
Machine learning;
Diagnostic model
- From:
Chinese Journal of Rheumatology
2024;28(3):167-175
- CountryChina
- Language:Chinese
-
Abstract:
Objective:Screening factors that might influence rheumatoid arthritis (RA) complicating interstitial lung diseases (ILD) by constructing and validating a model for early diagnostic.Methods:The study subjects were composed of 712 RA patients in the Department of Rheumatology and Immunology of the Second Hospital of Shanxi Medical University during December 2019 to October 2022. Fifty-two variables such as their demographic data, clinical symptoms, and laboratory indexes were collected. Patients were categorized into RA-only group and RA-ILD group with or without the occurrence of ILD disease. After data preprocessing, subjects were randomly assigned to the modeling and validation groups in a 7:3 ratio.Univariate analysis comparing baseline characteristics of the two groups of patients. Feature selection was performed using LASSO and SVM-RFE regression algorithms.Screening indicators were analyzed by logistic regression and the results were used to develop a nomograms model for the early diagnosis of RA complicating interstitial lung disease; and the modeling group was evaluated for its performance for internal assessment of the model and internal validation using data from the validation group.Results:A total of 712 subjects participated in the study, of which 498 in the modeling group and 214 in the validation group. Univariate analysis showed that the differences between the two groups were statistically significant ( P<0.05) in 18 characteristic indexes, including male, gender, age, smoking history, drinking history, number of swollen joints, number of painful joints, use of prednisone, WBC, ESR, CRP, IL-2, IL-10, IL-17, TNF-α, INF-γ, AFA family, APF, and serum albumin. The LASSO algorithm identified 13 risk variables for RA-ILD, the SVM-RFE algorithm identified 12 variables for RA-ILD, and the intersecting risk variables were male, age, history of alcohol consumption, number of painful joints, prednisone acetate, IL-2, AFA family, TNF-α, serum albumin, and IL-10. The results of multifactorial logistic regression analysis confirmed that the differences between males [ OR(95% CI)=3.61(2.11, 6.18)], gender, age [ OR(95% CI)=1.05(1.03, 1.08)], number of painful joints [ OR(95% CI)=1.03(1.01, 1.06)], IL-2 [ OR(95% CI)=0.91 (0.84, 0.99)], and TNF-α[ OR (95% CI)=1.06 (1.02, 1.10)] were statistically significant ( P<0.05) and were independently influences on ILD complicated by RA. The modeling and validation groups that were used to construct early diagnostic Nomograms had high calibration curve accuracies, and the model had a high diagnostic power, which was mainly demonstrated by the receiver operating characteristic (ROC) area under the curve (AUC) and decision curve analysis(DCA), the model modeling group had an AUC of 0.76 (95% CI=0.71, 0.81), with net benefit rates of 3%~82% and 93%~99%, whereas the model validation group had an AUC of 0.71 (95% CI=0.64, 0.79), with net benefit rates of 5%~11%, 14%~60% and 85%~89%. Conclusion:Male, gender, age, number of painful joints, IL-2, and TNF-α are independent factors for RA complicated with ILD, and the Nomogram model constructed has good performance in early diagnosis of the disease.