1.A machine learning approach for the diagnosis of obstructive sleep apnoea using oximetry, demographic and anthropometric data.
Zhou Hao LEONG ; Shaun Ray Han LOH ; Leong Chai LEOW ; Thun How ONG ; Song Tar TOH
Singapore medical journal 2025;66(4):195-201
INTRODUCTION:
Obstructive sleep apnoea (OSA) is a serious but underdiagnosed condition. Demand for the gold standard diagnostic polysomnogram (PSG) far exceeds its availability. More efficient diagnostic methods are needed, even in tertiary settings. Machine learning (ML) models have strengths in disease prediction and early diagnosis. We explored the use of ML with oximetry, demographic and anthropometric data to diagnose OSA.
METHODS:
A total of 2,996 patients were included for modelling and divided into test and training sets. Seven commonly used supervised learning algorithms were trained with the data. Sensitivity (recall), specificity, positive predictive value (PPV) (precision), negative predictive value, area under the receiver operating characteristic curve (AUC) and F1 measure were reported for each model.
RESULTS:
In the best performing four-class model (neural network model predicting no, mild, moderate or severe OSA), a prediction of moderate and/or severe disease had a combined PPV of 94%; one out of 335 patients had no OSA and 19 had mild OSA. In the best performing two-class model (logistic regression model predicting no-mild vs. moderate-severe OSA), the PPV for moderate-severe OSA was 92%; two out of 350 patients had no OSA and 26 had mild OSA.
CONCLUSION
Our study showed that the prediction of moderate-severe OSA in a tertiary setting with an ML approach is a viable option to facilitate early identification of OSA. Prospective studies with home-based oximeters and analysis of other oximetry variables are the next steps towards formal implementation.
Humans
;
Oximetry/methods*
;
Sleep Apnea, Obstructive/diagnosis*
;
Male
;
Female
;
Middle Aged
;
Machine Learning
;
Polysomnography
;
Adult
;
Anthropometry
;
ROC Curve
;
Aged
;
Algorithms
;
Predictive Value of Tests
;
Sensitivity and Specificity
;
Neural Networks, Computer
;
Demography
2.Use of deep learning model for paediatric elbow radiograph binomial classification: initial experience, performance and lessons learnt.
Mark Bangwei TAN ; Yuezhi Russ CHUA ; Qiao FAN ; Marielle Valerie FORTIER ; Peiqi Pearlly CHANG
Singapore medical journal 2025;66(4):208-214
INTRODUCTION:
In this study, we aimed to compare the performance of a convolutional neural network (CNN)-based deep learning model that was trained on a dataset of normal and abnormal paediatric elbow radiographs with that of paediatric emergency department (ED) physicians on a binomial classification task.
METHODS:
A total of 1,314 paediatric elbow lateral radiographs (patient mean age 8.2 years) were retrospectively retrieved and classified based on annotation as normal or abnormal (with pathology). They were then randomly partitioned to a development set (993 images); first and second tuning (validation) sets (109 and 100 images, respectively); and a test set (112 images). An artificial intelligence (AI) model was trained on the development set using the EfficientNet B1 network architecture. Its performance on the test set was compared to that of five physicians (inter-rater agreement: fair). Performance of the AI model and the physician group was tested using McNemar test.
RESULTS:
The accuracy of the AI model on the test set was 80.4% (95% confidence interval [CI] 71.8%-87.3%), and the area under the receiver operating characteristic curve (AUROC) was 0.872 (95% CI 0.831-0.947). The performance of the AI model vs. the physician group on the test set was: sensitivity 79.0% (95% CI: 68.4%-89.5%) vs. 64.9% (95% CI: 52.5%-77.3%; P = 0.088); and specificity 81.8% (95% CI: 71.6%-92.0%) vs. 87.3% (95% CI: 78.5%-96.1%; P = 0.439).
CONCLUSION
The AI model showed good AUROC values and higher sensitivity, with the P-value at nominal significance when compared to the clinician group.
Humans
;
Deep Learning
;
Child
;
Retrospective Studies
;
Male
;
Female
;
Radiography/methods*
;
ROC Curve
;
Elbow/diagnostic imaging*
;
Neural Networks, Computer
;
Child, Preschool
;
Elbow Joint/diagnostic imaging*
;
Emergency Service, Hospital
;
Adolescent
;
Infant
;
Artificial Intelligence
3.Development and multicenter validation of machine learning models for predicting postoperative pulmonary complications after neurosurgery.
Ming XU ; Wenhao ZHU ; Siyu HOU ; Hongzhi XU ; Jingwen XIA ; Liyu LIN ; Hao FU ; Mingyu YOU ; Jiafeng WANG ; Zhi XIE ; Xiaohong WEN ; Yingwei WANG
Chinese Medical Journal 2025;138(17):2170-2179
BACKGROUND:
Postoperative pulmonary complications (PPCs) are major adverse events in neurosurgical patients. This study aimed to develop and validate machine learning models predicting PPCs after neurosurgery.
METHODS:
PPCs were defined according to the European Perioperative Clinical Outcome standards as occurring within 7 postoperative days. Data of cases meeting inclusion/exclusion criteria were extracted from the anesthesia information management system to create three datasets: The development (data of Huashan Hospital, Fudan University from 2018 to 2020), temporal validation (data of Huashan Hospital, Fudan University in 2021) and external validation (data of other three hospitals in 2023) datasets. Machine learning models of six algorithms were trained using either 35 retrievable and plausible features or the 11 features selected by Lasso regression. Temporal validation was conducted for all models and the 11-feature models were also externally validated. Independent risk factors were identified and feature importance in top models was analyzed.
RESULTS:
PPCs occurred in 712 of 7533 (9.5%), 258 of 2824 (9.1%), and 207 of 2300 (9.0%) patients in the development, temporal validation and external validation datasets, respectively. During cross-validation training, all models except Bayes demonstrated good discrimination with an area under the receiver operating characteristic curve (AUC) of 0.840. In temporal validation of full-feature models, deep neural network (DNN) performed the best with an AUC of 0.835 (95% confidence interval [CI]: 0.805-0.858) and a Brier score of 0.069, followed by Logistic regression (LR), random forest and XGBoost. The 11-feature models performed comparable to full-feature models with very close but statistically significantly lower AUCs, with the top models of DNN and LR in temporal and external validations. An 11-feature nomogram was drawn based on the LR algorithm and it outperformed the minimally modified Assess respiratory RIsk in Surgical patients in CATalonia (ARISCAT) and Laparoscopic Surgery Video Educational Guidelines (LAS VEGAS) scores with a higher AUC (LR: 0.824, ARISCAT: 0.672, LAS: 0.663). Independent risk factors based on multivariate LR mostly overlapped with Lasso-selected features, but lacked consistency with the important features using the Shapley additive explanation (SHAP) method of the LR model.
CONCLUSIONS:
The developed models, especially the DNN model and the nomogram, had good discrimination and calibration, and could be used for predicting PPCs in neurosurgical patients. The establishment of machine learning models and the ascertainment of risk factors might assist clinical decision support for improving surgical outcomes.
TRIAL REGISTRATION
ChiCTR 2100047474; https://www.chictr.org.cn/showproj.html?proj=128279 .
Adult
;
Aged
;
Female
;
Humans
;
Male
;
Middle Aged
;
Algorithms
;
Lung Diseases/etiology*
;
Machine Learning
;
Neurosurgical Procedures/adverse effects*
;
Postoperative Complications/diagnosis*
;
Risk Factors
;
ROC Curve
4.Machine learning models established to distinguish OA and RA based on immune factors in the knee joint fluid.
Qin LIANG ; Lingzhi ZHAO ; Yan LU ; Rui ZHANG ; Qiaolin YANG ; Hui FU ; Haiping LIU ; Lei ZHANG ; Guoduo LI
Chinese Journal of Cellular and Molecular Immunology 2025;41(4):331-338
Objective Based on 25 indicators including immune factors, cell count classification, and smear results of the knee joint fluid, machine learning models were established to distinguish between osteoarthritis (OA) and rheumatoid arthritis (RA). Methods 100 OA and 40 RA patients scheduled for total knee arthroplasty were enrolled respectively. Each patient's knee joint fluid was collected preoperatively. Nucleated cells were counted and classified. The expression levels of immune factors, including tumor necrosis factor alpha (TNF-α), interleukin-1 beta (IL-1β), IL-6, IL-8, IL-15, matrix metalloproteinase 3 (MMP3), MMP9, MMP13, rheumatoid factor (RF), serum amyloid A (SAA), C-reactive protein (CRP), and others were measured. Smears and microscopic classification of all the immune factors were performed. Independent influencing factors for OA or RA were identified using univariate binary logistic regression, Lasso regression, and multivariate binary logistic regression. Based on the independent influencing factors, three machine learning models were constructed which are logistic regression, random forest, and support vector machine. Receiver operating characteristic curve (ROC), calibration curve and decision curve analysis (DCA) were used to evaluate and compare the models. Results A total of 5 indicators in the knee joint fluid were screened out to distinguish OA and RA, which were IL-1β(odds ratio(OR)=10.512, 95× confidence interval (95×CI) was 1.048-105.42, P=0.045), IL-6 (OR=1.007, 95×CI was 1.001-1.014, P=0.022), MMP9 (OR=3.202, 95×CI was 1.235-8.305, P=0.017), MMP13 (OR=1.002, 95× CI was 1-1.004, P=0.049), and RF (OR=1.091, 95×CI was 1.01-1.179, P=0.026). According to the results of ROC, calibration curve and DCA, the accuracy (0.979), sensitivity (0.98) and area under the curve (AUC, 0.996, 95×CI was 0.991-1) of the random forest model were the highest. It has good validity and feasibility, and its distinguishing ability is better than the other two models. Conclusion The machine learning model based on immune factors in the knee joint fluid holds significant value in distinguishing OA and RA. It provides an important reference for the clinical early differential diagnosis, prevention and treatment of OA and RA.
Humans
;
Arthritis, Rheumatoid/metabolism*
;
Machine Learning
;
Male
;
Female
;
Middle Aged
;
Aged
;
Synovial Fluid/immunology*
;
Osteoarthritis, Knee/metabolism*
;
Knee Joint/metabolism*
;
ROC Curve
;
Diagnosis, Differential
5.Value of biomarkers related to routine blood tests in early diagnosis of allergic rhinitis in children.
Jinjie LI ; Xiaoyan HAO ; Yijuan XIN ; Rui LI ; Lin ZHU ; Xiaoli CHENG ; Liu YANG ; Jiayun LIU
Chinese Journal of Cellular and Molecular Immunology 2025;41(4):339-347
Objective To mine and analyze the routine blood test data of children with allergic rhinitis (AR), identify routine blood parameters related to childhood allergic rhinitis, establish an effective diagnostic model, and evaluate the performance of the model. Methods This study was a retrospective study of clinical cases. The experimental group comprised a total of 1110 children diagnosed with AR at the First Affiliated Hospital of Air Force Medical University during the period from December 12, 2020 to December 12, 2021, while the control group included 1109 children without a history of allergic rhinitis or other allergic diseases who underwent routine physical examinations during the same period. Information such as age, sex and routine blood test results was collected for all subjects. The levels of routine blood test indicators were compared between AR children and healthy children using comprehensive intelligent baseline analysis, with indicators of P≥0.05 excluded; variables were screened by Lasso regression. Binary Logistic regression was used to further evaluate the influence of multiple routine blood indexes on the results. Five kinds of machine model algorithms were used, namely extreme value gradient lift (XGBoost), logistic regression (LR), gradient lift decision tree (LGBMC), Random forest (RF) and adaptive lift algorithm (AdaBoost), to establish the diagnostic models. The receiver operating characteristic (ROC) curve was used to screen the optimal model. The best LightGBM algorithm was used to build an online patient risk assessment tool for clinical application. Results Statistically significant differences were observed between the AR group and the control group in the following routine blood test indicators: mean cellular hemoglobin concentration (MCHC), hemoglobin (HGB), absolute value of basophils (BASO), absolute value of eosinophils (EOS), large platelet ratio (P-LCR), mean platelet volume (MPV), platelet distribution width (PDW), platelet count (PLT), absolute values of leukocyte neutrophil (W-LCC), leukocyte monocyte (W-MCC), leukocyte lymphocyte (W-SCC), and age. Lasso regression identified these variables as important predictors, and binary Logistic regression further analyzed the significant influence of these variables on the results. The optimal machine learning algorithm LightGBM was used to establish a multi-index joint detection model. The model showed robust prediction performance in the training set, with AUC values of 0.8512 and 0.8103 in the internal validation set. Conclusion The identified routine blood parameters can be used as potential biomarkers for early diagnosis and risk assessment of AR, which can improve the accuracy and efficiency of diagnosis. The established model provides scientific basis for more accurate diagnostic tools and personalized prevention strategies. Future studies should prospectively validate these findings and explore their applicability in other related diseases.
Humans
;
Male
;
Female
;
Rhinitis, Allergic/blood*
;
Child
;
Biomarkers/blood*
;
Retrospective Studies
;
Early Diagnosis
;
Child, Preschool
;
ROC Curve
;
Logistic Models
;
Hematologic Tests
;
Algorithms
;
Adolescent
;
Machine Learning
6.Explainable machine learning model for predicting septic shock in critically sepsis patients based on coagulation indexes: A multicenter cohort study.
Qing-Bo ZENG ; En-Lan PENG ; Ye ZHOU ; Qing-Wei LIN ; Lin-Cui ZHONG ; Long-Ping HE ; Nian-Qing ZHANG ; Jing-Chun SONG
Chinese Journal of Traumatology 2025;28(6):404-411
PURPOSE:
Septic shock is associated with high mortality and poor outcomes among sepsis patients with coagulopathy. Although traditional statistical methods or machine learning (ML) algorithms have been proposed to predict septic shock, these potential approaches have never been systematically compared. The present work aimed to develop and compare models to predict septic shock among patients with sepsis.
METHODS:
It is a retrospective cohort study based on 484 patients with sepsis who were admitted to our intensive care units between May 2018 and November 2022. Patients from the 908th Hospital of Chinese PLA Logistical Support Force and Nanchang Hongdu Hospital of Traditional Chinese Medicine were respectively allocated to training (n=311) and validation (n=173) sets. All clinical and laboratory data of sepsis patients characterized by comprehensive coagulation indexes were collected. We developed 5 models based on ML algorithms and 1 model based on a traditional statistical method to predict septic shock in the training cohort. The performance of all models was assessed using the area under the receiver operating characteristic curve and calibration plots. Decision curve analysis was used to evaluate the net benefit of the models. The validation set was applied to verify the predictive accuracy of the models. This study also used Shapley additive explanations method to assess variable importance and explain the prediction made by a ML algorithm.
RESULTS:
Among all patients, 37.2% experienced septic shock. The characteristic curves of the 6 models ranged from 0.833 to 0.962 and 0.630 to 0.744 in the training and validation sets, respectively. The model with the best prediction performance was based on the support vector machine (SVM) algorithm, which was constructed by age, tissue plasminogen activator-inhibitor complex, prothrombin time, international normalized ratio, white blood cells, and platelet counts. The SVM model showed good calibration and discrimination and a greater net benefit in decision curve analysis.
CONCLUSION
The SVM algorithm may be superior to other ML and traditional statistical algorithms for predicting septic shock. Physicians can better understand the reliability of the predictive model by Shapley additive explanations value analysis.
Humans
;
Shock, Septic/blood*
;
Machine Learning
;
Male
;
Female
;
Retrospective Studies
;
Middle Aged
;
Aged
;
Sepsis/complications*
;
ROC Curve
;
Cohort Studies
;
Adult
;
Intensive Care Units
;
Algorithms
;
Blood Coagulation
;
Critical Illness
7.A deep learning method for differentiating nasopharyngeal carcinoma and lymphoma based on MRI.
Yuchen TANG ; Hongli HUA ; Yan WANG ; Zezhang TAO
Journal of Clinical Otorhinolaryngology Head and Neck Surgery 2025;39(7):597-609
Objective:To development a deep learning(DL) model based on conventional MRI for automatic segmentation and differential diagnosis of nasopharyngeal carcinoma(NPC) and nasopharyngeal lymphoma(NPL). Methods:The retrospective study included 142 patients with NPL and 292 patients with NPC who underwent conventional MRI at Renmin Hospital of Wuhan University from June 2012 to February 2023. MRI from 80 patients were manually segmented to train the segmentation model. The automatically segmented regions of interest(ROIs) formed four datasets: T1 weighted images(T1WI), T2 weighted images(T2WI), T1 weighted contrast-enhanced images(T1CE), and a combination of T1WI and T2WI. The ImageNet-pretrained ResNet101 model was fine-tuned for the classification task. Statistical analysis was conducted using SPSS 22.0. The Dice coefficient loss was used to evaluate performance of segmentation task. Diagnostic performance was assessed using receiver operating characteristic(ROC) curves. Gradient-weighted class activation mapping(Grad-CAM) was imported to visualize the model's function. Results:The DICE score of the segmentation model reached 0.876 in the testing set. The AUC values of classification models in testing set were as follows: T1WI: 0.78(95%CI 0.67-0.81), T2WI: 0.75(95%CI 0.72-0.86), T1CE: 0.84(95%CI 0.76-0.87), and T1WI+T2WI: 0.93(95%CI 0.85-0.94). The AUC values for the two clinicians were 0.77(95%CI 0.72-0.82) for the junior, and 0.84(95%CI 0.80-0.89) for the senior. Grad-CAM analysis revealed that the central region of the tumor was highly correlated with the model's classification decisions, while the correlation was lower in the peripheral regions. Conclusion:The deep learning model performed well in differentiating NPC from NPL based on conventional MRI. The T1WI+T2WI combination model exhibited the best performance. The model can assist in the early diagnosis of NPC and NPL, facilitating timely and standardized treatment, which may improve patient prognosis.
Humans
;
Nasopharyngeal Carcinoma/diagnostic imaging*
;
Deep Learning
;
Magnetic Resonance Imaging
;
Retrospective Studies
;
Nasopharyngeal Neoplasms/diagnostic imaging*
;
Lymphoma/diagnostic imaging*
;
Diagnosis, Differential
;
ROC Curve
;
Male
;
Female
;
Middle Aged
;
Adult
8.A multi-constraint representation learning model for identification of ovarian cancer with missing laboratory indicators.
Zihan LU ; Fangjun HUANG ; Guangyao CAI ; Jihong LIU ; Xin ZHEN
Journal of Southern Medical University 2025;45(1):170-178
OBJECTIVES:
To evaluate the performance of a multi-constraint representation learning classification model for identifying ovarian cancer with missing laboratory indicators.
METHODS:
Tabular data with missing laboratory indicators were collected from 393 patients with ovarian cancer and 1951 control patients. The missing ovarian cancer laboratory indicator features were projected to the latent space to obtain a classification model using the representational learning classification model based on discriminative learning and mutual information coupled with feature projection significance score consistency and missing location estimation. The proposed constraint term was ablated experimentally to assess the feasibility and validity of the constraint term by accuracy, area under the ROC curve (AUC), sensitivity, and specificity. Cross-validation methods and accuracy, AUC, sensitivity and specificity were also used to evaluate the discriminative performance of this classification model in comparison with other interpolation methods for processing of the missing data.
RESULTS:
The results of the ablation experiments showed good compatibility among the constraints, and each constraint had good robustness. The cross-validation experiment showed that for identification of ovarian cancer with missing laboratory indicators, the AUC, accuracy, sensitivity and specificity of the proposed multi-constraints representation-based learning classification model was 0.915, 0.888, 0.774, and 0.910, respectively, and its AUC and sensitivity were superior to those of other interpolation methods.
CONCLUSIONS
The proposed model has excellent discriminatory ability with better performance than other missing data interpolation methods for identification of ovarian cancer with missing laboratory indicators.
Female
;
Humans
;
Ovarian Neoplasms/diagnosis*
;
Machine Learning
;
ROC Curve
9.Establishment and evaluation of a machine learning prediction model for sepsis-related encephalopathy in the elderly.
Xiao YUE ; Yiwen WANG ; Zhifang LI ; Lei WANG ; Li HUANG ; Shuo WANG ; Yiming HOU ; Shu ZHANG ; Zhengbin WANG
Chinese Critical Care Medicine 2025;37(10):937-943
OBJECTIVE:
To construct machine learning prediction model for sepsis-associated encephalopathy (SAE), and analyze the application value of the model on early identification of SAE risk in elderly septic patients.
METHODS:
Patients aged over 60 years with a primary diagnosis of sepsis admitted to intensive care unit (ICU) from 2008 to 2023 were selected from Medical Information Mart for Intensive Care-IV 2.2 (MIMIC-IV 2.2). Demographic variables, disease severity scores, comorbidities, interventions, laboratory indicators, and hospitalization details were collected. Key factors associated with SAE were identified using univariate Logistic regression analysis. The data were randomly divided into training and validation sets in a 7 : 3 ratio. Multivariable Logistic regression analysis was conducted in the training set and visualized using a nomogram model for prediction of SAE. The discrimination of the model was evaluated in the validation set using the receiver operator characteristic curve (ROC curve), and its calibration was assessed using calibration curve. Furthermore, multiple machine learning algorithms, including multi-layer perceptron (MLP), support vector machine (SVM), naive bayes (NB), gradient boosting machine (GBM), random forest (RF), and extreme gradient boosting (XGB), were constructed in the training set. Their predictive performance was subsequently evaluated on the validation set. Taking the XGB model as an example, the interpretability of the model through the SHapley Additive exPlanations (SHAP) algorithm was enhanced to identify the key predictive factors and their contributions.
RESULTS:
A total of 2 204 septic patients were finally enrolled, of whom 840 developed SAE (38.1%). A total of 21 variables associated with SAE were screened through univariate Logistic regression analysis. Multivariable Logistic regression analysis showed that endotracheal intubation [odds ratio (OR) = 0.40, 95% confidence interval (95%CI) was 0.19-0.88, P < 0.001], oxygen therapy (OR = 0.76, 95%CI was 0.53-0.95, P = 0.023), tracheotomy (OR = 0.20, 95%CI was 0.07-0.53, P < 0.001), continuous renal replacement therapy (CRRT; OR = 0.32, 95%CI was 0.15-0.70, P < 0.001), cerebrovascular disease (OR = 0.31, 95%CI was 0.16-0.60, P < 0.001), rheumatic disease (OR = 0.44, 95%CI was 0.19-0.99, P < 0.001), male (OR = 0.68, 95%CI was 0.54-0.86, P = 0.001), and maximum anion gap (AG; OR = 0.95, 95%CI was 0.93-0.97, P < 0.001) were associated with an decreased probability of SAE, and age (OR = 1.05, 95%CI was 1.03-1.06, P < 0.001), acute physiology score III (APSIII; OR = 1.02, 95%CI was 1.01-1.02, P < 0.001), Oxford acute severity of illness score (OASIS; OR = 1.04, 95%CI was 1.03-1.06, P < 0.001), and length of hospital stay (OR = 1.01, 95%CI was 1.01-1.02, P < 0.001) were associated with an increased probability of SAE. A nomogram model was constructed based on these variables. In the validation set, ROC curve analysis showed that the model achieved an area under the ROC curve (AUC) of 0.723, and the calibration curve showed good consistency between the predicted probability of the model and the observed probability. Among the machine learning algorithms, including MLP, SVM, NB, GBM, RF, and XGB, the SVM model and RF model demonstrated relatively good predictive performance, with AUC of 0.748 and 0.739, respectively, and the sensitivity was both exceeding 85%. The predictive performance of the XGB model was explained through SHAP analysis, and the results indicated that APSIII score (SHAP value was 0.871), age (SHAP value was 0.521), and OASIS score (SHAP value was 0.443) were important factors affecting the predictive performance of the model.
CONCLUSIONS
The machine learning-based SAE prediction model exhibits good predictive capability and holds significant application value for the early identification of SAE risk in elderly septic patients.
Humans
;
Machine Learning
;
Aged
;
Sepsis-Associated Encephalopathy
;
Sepsis/complications*
;
Intensive Care Units
;
Logistic Models
;
Middle Aged
;
Male
;
ROC Curve
;
Female
;
Bayes Theorem
;
Nomograms
;
Support Vector Machine
;
Algorithms
10.Identification of osteoid and chondroid matrix mineralization in primary bone tumors using a deep learning fusion model based on CT and clinical features: a multi-center retrospective study.
Caolin LIU ; Qingqing ZOU ; Menghong WANG ; Qinmei YANG ; Liwen SONG ; Zixiao LU ; Qianjin FENG ; Yinghua ZHAO
Journal of Southern Medical University 2024;44(12):2412-2420
METHODS:
We retrospectively collected CT scan data from 276 patients with pathologically confirmed primary bone tumors from 4 medical centers in Guangdong Province between January, 2010 and August, 2021. A convolutional neural network (CNN) was employed as the deep learning architecture. The optimal baseline deep learning model (R-Net) was determined through transfer learning, and an optimized model (S-Net) was obtained through algorithmic improvements. Multivariate logistic regression analysis was used to screen the clinical features such as sex, age, mineralization location, and pathological fractures, which were then connected with the imaging features to construct the deep learning fusion model (SC-Net). The diagnostic performance of the SC-Net model and machine learning models were compared with radiologists' diagnoses, and their classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) and F1 score.
RESULTS:
In the external test set, the fusion model (SC-Net) achieved the best performance with an AUC of 0.901 (95% CI: 0.803-1.00), an accuracy of 83.7% (95% CI: 69.3%-93.2%) and an F1 score of 0.857, and outperformed the S-Net model with an AUC of 0.818 (95% CI: 0.694-0.942), an accuracy of 76.7% (95% CI: 61.4%-88.2%), and an F1 score of 0.828. The overall classification performance of the fusion model (SC-Net) exceeded that of radiologists' diagnoses.
CONCLUSIONS
The deep learning fusion model based on multi-center CT images and clinical features is capable of accurate classification of osseous and chondroid matrix mineralization and may potentially improve the accuracy of clinical diagnoses of osteogenic versus chondrogenic primary bone tumors.
Humans
;
Deep Learning
;
Bone Neoplasms/diagnostic imaging*
;
Retrospective Studies
;
Tomography, X-Ray Computed/methods*
;
Neural Networks, Computer
;
Male
;
Female
;
ROC Curve
;
Algorithms

Result Analysis
Print
Save
E-mail