Diagnostic Performance and Interobserver Consistency of the Prostate Imaging Reporting and Data System Version 2: A Study on Six Prostate Radiologists with Different Experiences from Half a Year to 17 Years.
- Author:
Zan KE
1
;
Liang WANG
1
;
Xiang-De MIN
1
;
Zhao-Yan FENG
1
;
Zhen KANG
1
;
Pei-Pei ZHANG
1
;
Ba-Sen LI
1
;
Hui-Juan YOU
1
;
Sheng-Chao HOU
2
Author Information
- Publication Type:Journal Article
- Keywords: Benign Prostatic Hyperplasia; Diagnosis; Magnetic Resonance Imaging; Prostate Cancer; Prostate Imaging Reporting and Data System Version 2
- From: Chinese Medical Journal 2018;131(14):1666-1673
- CountryChina
- Language:English
-
Abstract:
BackgroundOne of the main aims of the updated Prostate Imaging Reporting and Data System Version 2 (PI-RADS v2) is to diminish variation in the interpretation and reporting of prostate imaging, especially among readers with varied experience levels. This study aimed to retrospectively analyze diagnostic consistency and accuracy for prostate disease among six radiologists with different experience levels from a single center and to evaluate the diagnostic performance of PI-RADS v2 scores in the detection of clinically significant prostate cancer (PCa).
MethodsFrom December 2014 to March 2016, 84 PCa patients and 99 benign prostatic shyperplasia patients who underwent 3.0T multiparametric magnetic resonance imaging before biopsy were included in our study. All patients received evaluation according to the PI-RADS v2 scale (1-5 scores) from six blinded readers (with 6 months and 2, 3, 4, 5, or 17 years of experience, respectively, the last reader was a reviewer/contributor for the PI-RADS v2). The correlation among the readers' scores and the Gleason score (GS) was determined with the Kendall test. Intra-/inter-observer agreement was evaluated using κ statistics, while receiver operating characteristic curve and area under the curve analyses were performed to evaluate the diagnostic performance of the scores.
ResultsBased on the PI-RADS v2, the median κ score and standard error among all possible pairs of readers were 0.506 and 0.043, respectively; the average correlation between the six readers' scores and the GS was positive, exhibiting weak-to-moderate strength (r = 0.391, P = 0.006). The AUC values of the six radiologists were 0.883, 0.924, 0.927, 0.932, 0.929, and 0.947, respectively.
ConclusionThe inter-reader agreement for the PI-RADS v2 among the six readers with different experience is weak to moderate. Different experience levels affect the interpretation of MRI images.