1.Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers
Dong Wook KIM ; Hye Young JANG ; Kyung Won KIM ; Youngbin SHIN ; Seong Ho PARK
Korean Journal of Radiology 2019;20(3):405-410
OBJECTIVE: To evaluate the design characteristics of studies that evaluated the performance of artificial intelligence (AI) algorithms for the diagnostic analysis of medical images. MATERIALS AND METHODS: PubMed MEDLINE and Embase databases were searched to identify original research articles published between January 1, 2018 and August 17, 2018 that investigated the performance of AI algorithms that analyze medical images to provide diagnostic decisions. Eligible articles were evaluated to determine 1) whether the study used external validation rather than internal validation, and in case of external validation, whether the data for validation were collected, 2) with diagnostic cohort design instead of diagnostic case-control design, 3) from multiple institutions, and 4) in a prospective manner. These are fundamental methodologic features recommended for clinical validation of AI performance in real-world practice. The studies that fulfilled the above criteria were identified. We classified the publishing journals into medical vs. non-medical journal groups. Then, the results were compared between medical and non-medical journals. RESULTS: Of 516 eligible published studies, only 6% (31 studies) performed external validation. None of the 31 studies adopted all three design features: diagnostic cohort design, the inclusion of multiple institutions, and prospective data collection for external validation. No significant difference was found between medical and non-medical journals. CONCLUSION: Nearly all of the studies published in the study period that evaluated the performance of AI algorithms for diagnostic analysis of medical images were designed as proof-of-concept technical feasibility studies and did not have the design features that are recommended for robust validation of the real-world clinical performance of AI algorithms.
Artificial Intelligence
;
Case-Control Studies
;
Cohort Studies
;
Data Collection
;
Feasibility Studies
;
Machine Learning
;
Prospective Studies
2.Test-retest repeatability of ultrasonographic shear wave elastography in a rat liver fibrosis model: toward a quantitative biomarker for preclinical trials
Youngbin SHIN ; Jimi HUH ; Su Jung HAM ; Young Chul CHO ; Yoonseok CHOI ; Dong-Cheol WOO ; Jeongjin LEE ; Kyung Won KIM
Ultrasonography 2021;40(1):126-135
This study evaluated the test-retest repeatability and measurement variability of ultrasonographic shear wave elastography (SWE) for liver stiffness in a rat liver fibrosis model. Methods: In 31 Sprague-Dawley rats divided into three groups (high-dose, low-dose, and control), liver fibrosis was induced by intraperitoneal administration of thioacetamide for 8 weeks. A dedicated radiographer performed SWE to measure liver stiffness in kilopascals in two sessions at a 3-day interval. We calculated correlations between liver stiffness and histopathologic results, measurement variability in each session using coefficients of variation (CoVs) and interquartile/median (IQR/M), and test-retest repeatability between both sessions using the repeatability coefficient. Results: Different levels of liver fibrosis in each group were successfully induced in the animal model. The mean liver stiffness values were 8.88±1.48 kPa in the control group, 11.62±1.70 kPa in the low-dose group, and 11.91±1.73 kPa in the high-dose group. The correlation between collagen areas and liver stiffness values was moderate (r=0.6). In all groups, the second session yielded lower CoVs (i.e., more reliable results) for liver stiffness than the first session, suggesting a training effect for the operator. The mean IQR/M values were also lower in the second session than in the first session, which had four outliers (0.21 vs. 0.12, P<0.001). The test-retest repeatability coefficient was 3.75 kPa and decreased to 2.82 kPa after removing the four outliers. Conclusion: The use of ultrasonographic SWE was confirmed to be feasible and repeatable for evaluating liver fibrosis in preclinical trials. Operator training might reduce variability in liver stiffness measurements.
3.Selection and Reporting of Statistical Methods to Assess Reliability of a Diagnostic Test: Conformity to Recommended Methods in a Peer-Reviewed Journal.
Ji Eun PARK ; Kyunghwa HAN ; Yu Sub SUNG ; Mi Sun CHUNG ; Hyun Jung KOO ; Hee Mang YOON ; Young Jun CHOI ; Seung Soo LEE ; Kyung Won KIM ; Youngbin SHIN ; Suah AN ; Hyo Min CHO ; Seong Ho PARK
Korean Journal of Radiology 2017;18(6):888-897
OBJECTIVE: To evaluate the frequency and adequacy of statistical analyses in a general radiology journal when reporting a reliability analysis for a diagnostic test. MATERIALS AND METHODS: Sixty-three studies of diagnostic test accuracy (DTA) and 36 studies reporting reliability analyses published in the Korean Journal of Radiology between 2012 and 2016 were analyzed. Studies were judged using the methodological guidelines of the Radiological Society of North America-Quantitative Imaging Biomarkers Alliance (RSNA-QIBA), and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. DTA studies were evaluated by nine editorial board members of the journal. Reliability studies were evaluated by study reviewers experienced with reliability analysis. RESULTS: Thirty-one (49.2%) of the 63 DTA studies did not include a reliability analysis when deemed necessary. Among the 36 reliability studies, proper statistical methods were used in all (5/5) studies dealing with dichotomous/nominal data, 46.7% (7/15) of studies dealing with ordinal data, and 95.2% (20/21) of studies dealing with continuous data. Statistical methods were described in sufficient detail regarding weighted kappa in 28.6% (2/7) of studies and regarding the model and assumptions of intraclass correlation coefficient in 35.3% (6/17) and 29.4% (5/17) of studies, respectively. Reliability parameters were used as if they were agreement parameters in 23.1% (3/13) of studies. Reproducibility and repeatability were used incorrectly in 20% (3/15) of studies. CONCLUSION: Greater attention to the importance of reporting reliability, thorough description of the related statistical methods, efforts not to neglect agreement parameters, and better use of relevant terminology is necessary.
Biomarkers
;
Diagnostic Tests, Routine*
;
Methods*
4.Evolution of Radiological Treatment Response Assessments for Cancer Immunotherapy: From iRECIST to Radiomics and Artificial Intelligence
Nari KIM ; Eun Sung LEE ; Sang Eun WON ; Mihyun YANG ; Amy Junghyun LEE ; Youngbin SHIN ; Yousun KO ; Junhee PYO ; Hyo Jung PARK ; Kyung Won KIM
Korean Journal of Radiology 2022;23(11):1089-1101
Immunotherapy has revolutionized and opened a new paradigm for cancer treatment. In the era of immunotherapy and molecular targeted therapy, precision medicine has gained emphasis, and an early response assessment is a key element of this approach. Treatment response assessment for immunotherapy is challenging for radiologists because of the rapid development of immunotherapeutic agents, from immune checkpoint inhibitors to chimeric antigen receptor-T cells, with which many radiologists may not be familiar, and the atypical responses to therapy, such as pseudoprogression and hyperprogression.Therefore, new response assessment methods such as immune response assessment, functional/molecular imaging biomarkers, and artificial intelligence (including radiomics and machine learning approaches) have been developed and investigated.Radiologists should be aware of recent trends in immunotherapy development and new response assessment methods.