Use of deep learning model for paediatric elbow radiograph binomial classification: initial experience, performance and lessons learnt.
10.4103/singaporemedj.SMJ-2022-078
- Author:
Mark Bangwei TAN
1
;
Yuezhi Russ CHUA
2
;
Qiao FAN
3
;
Marielle Valerie FORTIER
4
;
Peiqi Pearlly CHANG
5
Author Information
1. Department of Diagnostic Radiology, Singapore General Hospital, Singapore.
2. Agency for Science, Technology and Research, Singapore.
3. Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore.
4. Department of Diagnostic and Interventional Imaging, KK Women's and Children's Hospital, Singapore.
5. Department of Paediatrics, KK Women's and Children's Hospital, Singapore.
- Publication Type:Journal Article
- Keywords:
Artificial intelligence;
emergency radiology;
machine learning;
musculoskeletal radiology;
paediatric radiology
- MeSH:
Humans;
Deep Learning;
Child;
Retrospective Studies;
Male;
Female;
Radiography/methods*;
ROC Curve;
Elbow/diagnostic imaging*;
Neural Networks, Computer;
Child, Preschool;
Elbow Joint/diagnostic imaging*;
Emergency Service, Hospital;
Adolescent;
Infant;
Artificial Intelligence
- From:Singapore medical journal
2025;66(4):208-214
- CountrySingapore
- Language:English
-
Abstract:
INTRODUCTION:In this study, we aimed to compare the performance of a convolutional neural network (CNN)-based deep learning model that was trained on a dataset of normal and abnormal paediatric elbow radiographs with that of paediatric emergency department (ED) physicians on a binomial classification task.
METHODS:A total of 1,314 paediatric elbow lateral radiographs (patient mean age 8.2 years) were retrospectively retrieved and classified based on annotation as normal or abnormal (with pathology). They were then randomly partitioned to a development set (993 images); first and second tuning (validation) sets (109 and 100 images, respectively); and a test set (112 images). An artificial intelligence (AI) model was trained on the development set using the EfficientNet B1 network architecture. Its performance on the test set was compared to that of five physicians (inter-rater agreement: fair). Performance of the AI model and the physician group was tested using McNemar test.
RESULTS:The accuracy of the AI model on the test set was 80.4% (95% confidence interval [CI] 71.8%-87.3%), and the area under the receiver operating characteristic curve (AUROC) was 0.872 (95% CI 0.831-0.947). The performance of the AI model vs. the physician group on the test set was: sensitivity 79.0% (95% CI: 68.4%-89.5%) vs. 64.9% (95% CI: 52.5%-77.3%; P = 0.088); and specificity 81.8% (95% CI: 71.6%-92.0%) vs. 87.3% (95% CI: 78.5%-96.1%; P = 0.439).
CONCLUSION:The AI model showed good AUROC values and higher sensitivity, with the P-value at nominal significance when compared to the clinician group.