1.Effectiveness validation of a novel comprehensive classification for intertrochanteric fractures.
Lukuan CUI ; Hao LIU ; Jiangjing WANG ; Huanhuan FAN ; Dapeng WANG ; Shuhui WANG ; Chi SONG
Chinese Journal of Reparative and Reconstructive Surgery 2023;37(4):417-422
OBJECTIVE:
To validate the effectiveness of a novel comprehensive classification for intertrochanteric fracture (ITF).
METHODS:
The study included 616 patients with ITF, including 279 males (45.29%) and 337 females (54.71%); the age ranged from 23 to 100 years, with an average of 72.5 years. Two orthopaedic residents (observers Ⅰ and Ⅱ) and two senior orthopaedic surgeons (observers Ⅲ and Ⅳ) were selected to classify the CT imaging data of 616 patients in a random order by using the AO/Orthopaedic Trauma Association (AO/OTA) classification of 1996/2007 edition, the AO/OTA classification of 2018 edition, and the novel comprehensive classification method at an interval of 1 month. Kappa consistency test was used to evaluate the intra-observer and inter-observer consistency of the three ITF classification systems.
RESULTS:
The inter-observer consistency of the three classification systems evaluated by 4 observers twice showed that the 3 classification systems had strong inter-observer consistency. Among them, the κ value of the novel comprehensive classification was higher than that of the AO/OTA classification of 1996/2007 edition and 2018 edition, and the experience of observers had a certain impact on the classification results, and the inter-observer consistency of orthopaedic residents was slightly better than that of senior orthopaedic surgeons. The intra-observer consistency of two evaluations of three classification systems by 4 observers showed that the consistency of the novel comprehensive classification was better for the other 3 observers, except that the consistency of observer Ⅳ in the AO/OTA classification of 2018 version was slightly higher than that of the novel comprehensive classification. The results showed that the novel comprehensive classification has higher repeatability, and the intra-observer consistency of senior orthopaedic surgeons was better than that of orthopaedic residents.
CONCLUSION
The novel comprehensive classification system has good intra- and inter-observer consistency, and has high validity in the classification of CT images of ITF patients; the experience of observers has a certain impact on the results of the three classification systems, and those with more experiences have higher intra-observer consistency.
Male
;
Female
;
Humans
;
Young Adult
;
Adult
;
Middle Aged
;
Aged
;
Aged, 80 and over
;
Observer Variation
;
Reproducibility of Results
;
Hip Fractures/surgery*
;
Tomography, X-Ray Computed/methods*
;
Radiography
2.Agreement evaluation of the severity of oral epithelial dysplasia in oral leukoplakia.
Jia Kuan PENG ; Hong Xia DAN ; Hao XU ; Xin ZENG ; Qianming CHEN
Chinese Journal of Stomatology 2022;57(9):921-926
Objective: To evaluate the inter-observer agreement of the severity of oral epithelial dysplasia in oral leukoplakia, providing a theoretical basis for the development of a more objective grading system. Methods: This study included 60 digital pathological slides of oral leukoplakia from Oral Medicine Department of West China Hospital of Stomatology, Sichuan University, and 239 tissue microarray images of oral leukoplakia from State Key Laboratory of Oral Diseases, Sichuan University, to evaluate the agreement of grading. Besides, 1 000 patches were generated from the 60 digital pathological slides and were divided into 500 small-sized patches (224 pixel×224 pixel) and 500 large-sized patches (1 024 pixel×1 024 pixel), to evaluate the agreement of feature detection. Gradings and feature detections were completed by three pathological experts from the oral pathology departments of two Grade 3, Class A stomatological hospitals in China. Kappa coefficient was used to quantify the inter-observer agreement among pathologists. Results: Minimal agreement was found in the grading of oral epithelial dysplasia among pathologists (Kappa=0.30 in the pathological slide group, Kappa=0.30 in the tissue microarray group). None agreement was found in feature detection within the small-sized patches group (median Kappa=0.14 for architectural features, median Kappa=0.18 for cytological features), and minimal agreement was found in feature detection within the large-sized patches group (median Kappa=0.25 for architectural features, median Kappa=0.25 for cytological features). Conclusions: Generally, the agreement of grading and feature detection of oral epithelial dysplasia in oral leukoplakia is poor. Development of a more objective grading system of oral epithelial dysplasia based on artificial intelligence may be helpful to improve the agreement.
Artificial Intelligence
;
China
;
Humans
;
Leukoplakia, Oral
;
Observer Variation
;
Precancerous Conditions
3.Inter-rater reliability of a composite health promotion scoring system developed in Singapore.
Manimegalai KAILASAM ; Priyanka VANKAYALAPATI ; Yin Maw HSANN ; Kok Soong YANG
Singapore medical journal 2022;63(2):93-96
INTRODUCTION:
In view of the important role of the environment in improving population health, implementation of health promotion programmes is recommended in living and working environments. Assessing the prevalence of such community health-promoting practices is important to identify gaps and make continuous and tangible improvements to health-promoting environments. We aimed to evaluate the inter-rater reliability of a composite scorecard used to assess the prevalence of community health-promoting practices in Singapore.
METHODS:
Inter-rater reliability for the use of the composite health promotion scorecards was evaluated in eight residential zones in the western region of Singapore. The assessment involved three raters, and each zone was evaluated by two raters. Health-promoting practices in residential zones were assessed based on 44 measurable elements under five domains - community support and resources, healthy behaviours, chronic conditions, mental health and common medical emergencies - in the composite scorecard using weighted kappa. The strength of agreement was determined based on Landis and Koch's classification method.
RESULTS:
A high degree of agreement (almost perfect-to-perfect) was observed between both raters for the measurable elements from most domains and subdomains. An exception was observed for the community support and resources domain, where there was a lower degree of agreement between the raters for a few elements.
CONCLUSION
The composite scorecard demonstrated a high degree of reliability and yielded similar scores for the same residential zone, even when used by different raters.
Health Promotion
;
Humans
;
Observer Variation
;
Public Health
;
Reproducibility of Results
;
Singapore
4.Diagnostic consistency for observing endodontic files in digital radiographs displayed on different electronic devices.
Chinese Journal of Stomatology 2022;57(4):384-389
Objectives: To evaluate the diagnostic consistency of working lengths by observing endodontic files in root canals and periapical subtle structures in digital intraoral radiographs presented in two smartphones, a tablet and a laptop computer. Methods: A dried human skull embedded in an acrylic compound was used for exposing radiographs of the upper and lower second premolars and first molars with two endodontic files (Kerr files size 10 and 15) positioned to the full length of the roots or 1.5 mm short of apexes. A total of 100 radiographs were taken for each of the file sizes. Five observers were asked to assess all the 200 digital radiographs according to a 5-category scale in smartphone A (HUAWEI P9 Plus), smartphjone B (Apple iPhone 7), tablet (Apple iPad 2018) and laptop computer (Lenovo Thinkpad E480), respectively. The gold standard for receiver operating characteristic curve (ROC) analysis was determined with the endodontic Kerr file size 20. A total of 150 roots with files were radiographed, 75 of which with files reaching the radiographic apexes of the respective roots and 75 of which with files 1.5 mm short of the radiographic apexes for each endodontic file size. Results from ROC analysis was analyzed with one-way ANOVA and independent sample t test. Results: For the Kerr file size 10, the area under the ROC curve for laptop, tablet and two smartphones were 0.891±0.037, 0.869±0.037, 0.870±0.017 and 0.849±0.037, while for the Kerr file size 15 the ROC values were 0.957±0.02, 0.961±0.02, 0.961±0.01 and 0.961±0.02, respectively. There were no significant differences for diagnostic accuracy for observing endodontic file positions among digital radiographs presented in the two smartphones, one tablet and one laptop devices (endodontic file size 10: F=1.39, P=0.281; endodontic file size 15: F=0.05, P=0.985). A significant difference was found in the diagnostic accuracy of endodontic file positions between size 10 and 15 files in different display devices (t=-10.65, P<0.001). Conclusions: There was a high diagnostic consistency in the determination of working length and periapical subtle structures of roots by observing digital radiographs displayed on smartphones, tablet and laptop computer.
Dental Instruments
;
Dental Pulp Cavity/diagnostic imaging*
;
Electronics
;
Humans
;
Molar
;
Observer Variation
;
Root Canal Preparation
5.Reproducibility Analysis of Iodine Concentrations of Abdominal Parenchymal Organs Based on Spectral CT.
Qing Lin MENG ; Huan XU ; Lin Xiong ZONG ; Meng Qi LIU ; Zhi Ye CHEN
Acta Academiae Medicinae Sinicae 2021;43(1):57-62
Objective To investigate the intra-and inter-observer reproducibility of iodine concentrations of abdominal parenchymal organs based on spectral CT.Methods The water-free iodine images of the venous phase were retrospectively obtained from 50 patients with abdominal dynamic spectral CT scans.The iodine concentrations were measured in the left,right and caudate lobes of liver,spleen,pancreas and bilateral kidneys.Intraclass correlation coefficient(ICC)and Bland-Altman plot were employed to analyze the intra-and inter-observer reproducibility.Results The intra-observer ICCs of the left,right and caudate lobes of liver,spleen,pancreas,and left and right kidneys were 0.938(0.894,0.965),0.932(0.884,0.961),0.939(0.895,0.965),0.947(0.909,0.970),0.912(0.851,0.949),0.946(0.906,0.969)and 0.907(0.842,0.946),which indicated good intra-observer reproducibility.The inter-observer ICCs of the left,right and caudate lobes of liver,spleen,pancreas,and left and right kidneys were 0.947(0.909,0.970),0.927(0.875,0.958),0.943(0.902,0.968),0.956(0.924,0.975),0.934(0.887,0.962),0.927(0.875,0.958)and 0.892(0.818,0.937),which indicated good inter-observer reproducibility.Bland-Altman plots presented that more than 95% points of the intra-observer differences located within 95% CI of limits of agreement for the caudate lobe of liver,spleen,pancreas and bilateral kidneys,which was same as inter-observer differences of the caudate lobe of liver,spleen and right kidney.Conclusion The iodine concentration measurement based on the spectral CT presented good intra-and inter-observer reproducibility for the caudate lobe of liver and spleen.
Humans
;
Iodine
;
Observer Variation
;
Reproducibility of Results
;
Retrospective Studies
;
Tomography, X-Ray Computed
6.Inter- and intra-observer variability for the assessment of coronary artery tree description and lesion EvaluaTion (CatLet©) angiographic scoring system in patients with acute myocardial infarction.
Jin-Mei LIU ; Yang HE ; Ruo-Ling TENG ; Xiao-Dong QIAN ; Yun-Lang DAI ; Jian-Ping XU ; Xin ZHAO ; Ting-Bo JIANG ; Yong-Ming HE
Chinese Medical Journal 2020;134(4):425-430
BACKGROUND:
Previously, we developed a novel Coronary Artery Tree description and Lesion EvaluaTion (CatLet©) angiographic scoring system, which was capable of accounting for the variability in the coronary anatomy and assisting in the risk-stratification of patients with acute myocardial infarction (AMI). Our preliminary study revealed that the CatLet score better predicted clinical outcomes for AMI patients than the Synergy between Percutaneous Coronary Intervention with Taxus and Cardiac Surgery score. However, the reproducibility of the CatLet score in both inter- and intra-observer remains to be evaluated.
METHODS:
A total of 30 consecutive AMI patients, admitted in September of 2015, were independently assessed by two experienced interventional cardiologists to evaluate the inter-observer reproducibility of the CatLet score. Another set of 49 consecutive AMI patients, admitted between September and October in 2014, were assessed by one of the two interventional cardiologists on two occasions 3 months apart to evaluate the intra-observer reproducibility of the CatLet score. The weighted kappa was used to express the degree of agreement.
RESULTS:
The weighted kappa values (95% confidence interval) for the intra- and inter-observer reproducibility of the CatLet Score were 0.82 (0.59-1.00, Z = 7.23, P < 0.001) and 0.86 (0.54-1.00, Z = 5.20, P < 0.001), respectively, according to the tertile analysis (≤14, 15-22, >22). Regarding the adverse characteristics pertinent to lesions and dominance parameters, the kappa values for the inter-observer variability were 0.80 (0.56-1.00, Z = 6.47, P < 0.001) for total number of lesions, 0.57 (0.28-0.85, Z = 3.03, P < 0.001) for bifurcation, 0.69 (0.43-0.96, Z = 5.06, P < 0.001) for heavy calcification, 1.00 (0.72-1.00, Z = 6.93, P < 0.001) for tortuosity, 0.54 (0.26-0.82, Z = 3.78, P < 0.001) for thrombus, 0.69 (0.48-0.91, Z = 6.29, P < 0.001) for right coronary artery dominance, 0.69 (0.41-0.96, Z = 4.91, P < 0.001) for left anterior descending artery length, and 0.22 (0.06-0.51, Z = 1.56, P = 0.06) for diagonal size. Equivalent values for the intra-observer variability were moderate to almost perfect (range 0.54-1.00).
CONCLUSIONS
The reproducibility of the CatLet angiographic scoring system for evaluation of the coronary angiograms ranged from substantial to excellent. The high reproducibility of the CatLet angiographic scoring system will boost its clinical application to patients with AMI.
Coronary Angiography
;
Coronary Artery Disease
;
Humans
;
Myocardial Infarction/diagnostic imaging*
;
Observer Variation
;
Reproducibility of Results
;
Treatment Outcome
;
Trees
7.Feasibility of fully automated classification of whole slide images based on deep learning
Kyung Ok CHO ; Sung Hak LEE ; Hyun Jong JANG
The Korean Journal of Physiology and Pharmacology 2020;24(1):89-99
Although microscopic analysis of tissue slides has been the basis for disease diagnosis for decades, intra- and inter-observer variabilities remain issues to be resolved. The recent introduction of digital scanners has allowed for using deep learning in the analysis of tissue images because many whole slide images (WSIs) are accessible to researchers. In the present study, we investigated the possibility of a deep learning-based, fully automated, computer-aided diagnosis system with WSIs from a stomach adenocarcinoma dataset. Three different convolutional neural network architectures were tested to determine the better architecture for tissue classifier. Each network was trained to classify small tissue patches into normal or tumor. Based on the patch-level classification, tumor probability heatmaps can be overlaid on tissue images. We observed three different tissue patterns, including clear normal, clear tumor and ambiguous cases. We suggest that longer inspection time can be assigned to ambiguous cases compared to clear normal cases, increasing the accuracy and efficiency of histopathologic diagnosis by pre-evaluating the status of the WSIs. When the classifier was tested with completely different WSI dataset, the performance was not optimal because of the different tissue preparation quality. By including a small amount of data from the new dataset for training, the performance for the new dataset was much enhanced. These results indicated that WSI dataset should include tissues prepared from many different preparation conditions to construct a generalized tissue classifier. Thus, multi-national/multi-center dataset should be built for the application of deep learning in the real world medical practice.
Adenocarcinoma
;
Classification
;
Dataset
;
Diagnosis
;
Learning
;
Observer Variation
;
Stomach
8.Large Variation in Clinical Practice amongst Pediatricians in Treating Children with Recurrent Abdominal Pain
Michael W VAN KALLEVEEN ; Elise J NOORDHUIS ; Carole LASHAM ; Frans B PLÖTZ
Pediatric Gastroenterology, Hepatology & Nutrition 2019;22(3):225-232
PURPOSE: To evaluate intra- and inter-observer variability and guideline adherence amongst pediatricians in treating children aged between 4 and 18 years referred with recurrent abdominal pain (RAP) without red flags. METHODS: The first part of the study is a retrospective single-center cohort study. The diagnostic work-ups of eight pediatricians were compared to the national guidelines. Intra- and inter-observer variability were examined by Cramer's V test. Intra-observer variability was defined as the amount of variation within a pediatrician and inter-observer variability as the amount of variation between pediatricians in the application of diagnostic work-up in children with RAP. Prospectively, the same pediatricians were requested to provide a report on their management strategy with a fictitious case to prove similarities in retrospective diagnostic work-up. RESULTS: A total of 10 patients per pediatrician were analyzed. Retrospectively, a (very) weak association between pediatricians' diagnostic work-ups was found (0.22), which implies high inter-observer variability. The association between intra-observer diagnostic was moderate (range, 0.35–0.46). The Cramer's V of 0.60 in diagnostic work-up between pediatricians in the fictitious case implied the presence of a moderately strong association and lower inter-observer variability than in the retrospective study. Adherence to the guideline was 66.8%. CONCLUSION: We found a high intra- and inter-observer variability and moderate guideline adherence in daily clinical practice amongst pediatricians in treating children with RAP in a teaching hospital.
Abdominal Pain
;
Child
;
Cohort Studies
;
Guideline Adherence
;
Hospitals, Teaching
;
Humans
;
Observer Variation
;
Prospective Studies
;
Retrospective Studies
9.Inter-rater agreement of Korean Triage and Acuity Scale between emergency physicians and nurses
Hyung Il KIM ; Seong Beom OH ; Han Joo CHOI
Journal of the Korean Society of Emergency Medicine 2019;30(4):309-317
OBJECTIVE: The Korean Triage and Acuity Scale (KTAS) has been used in all emergency departments (EDs) since 2016. Medical personnel can provide the treatment priority based on the KTAS levels. The inter-rater agreement with KTAS has not been reported, even though most triage assignments are performed by nurses in Korea. This study was aimed to verify the agreement of triage levels between emergency physicians (EPs) and nurses with KTAS. METHODS: This was a prospective, single-center study of an academic tertiary medical center. If the patient visits the ED, the triage nurse and EP meet the patients together. The nurse performed the history taking and physical examinations including vital signs measurements then recorded the KTAS levels. The EP did not interfere with the nurse's decision. The EP also decided the KTAS levels. The designated codes and levels were compared. The EP recorded the detailed reasons for the disagreement if there was discrepancy. RESULTS: Comparisons were performed with 928 patients. The number of patients in each KTAS level was 95 (10.2%) in level I, 263 (28.3%) in level II, 348 (37.5%) in level III, 144 (15.5%) in level IV, and 78 (8.4%) in level V. The overall agreement was 761 (82%), and the Kappa coefficient was 0.691. The errors of history taking were most frequent (131, 78.4%). Insufficient understanding of the disease pathophysiology, inaccurate neurological examinations, and errors that did not consider the vital signs except for the blood pressure were encountered in 12 (7.2%). CONCLUSION: The agreement rate was high between EPs and nurses using KTAS (K=0.691, substantial agreement).
Blood Pressure
;
Emergencies
;
Emergency Service, Hospital
;
Humans
;
Korea
;
Neurologic Examination
;
Observer Variation
;
Physical Examination
;
Prospective Studies
;
Triage
;
Vital Signs
10.Interpretive Performance and Inter-Observer Agreement on Digital Mammography Test Sets
Sung Hun KIM ; Eun Hye LEE ; Jae Kwan JUN ; You Me KIM ; Yun Woo CHANG ; Jin Hwa LEE ; Hye Won KIM ; Eun Jung CHOI ;
Korean Journal of Radiology 2019;20(2):218-224
OBJECTIVE: To evaluate the interpretive performance and inter-observer agreement on digital mammographs among radiologists and to investigate whether radiologist characteristics affect performance and agreement. MATERIALS AND METHODS: The test sets consisted of full-field digital mammograms and contained 12 cancer cases among 1000 total cases. Twelve radiologists independently interpreted all mammograms. Performance indicators included the recall rate, cancer detection rate (CDR), positive predictive value (PPV), sensitivity, specificity, false positive rate (FPR), and area under the receiver operating characteristic curve (AUC). Inter-radiologist agreement was measured. The reporting radiologist characteristics included number of years of experience interpreting mammography, fellowship training in breast imaging, and annual volume of mammography interpretation. RESULTS: The mean and range of interpretive performance were as follows: recall rate, 7.5% (3.3–10.2%); CDR, 10.6 (8.0–12.0 per 1000 examinations); PPV, 15.9% (8.8–33.3%); sensitivity, 88.2% (66.7–100%); specificity, 93.5% (90.6–97.8%); FPR, 6.5% (2.2–9.4%); and AUC, 0.93 (0.82–0.99). Radiologists who annually interpreted more than 3000 screening mammograms tended to exhibit higher CDRs and sensitivities than those who interpreted fewer than 3000 mammograms (p = 0.064). The inter-radiologist agreement showed a percent agreement of 77.2–88.8% and a kappa value of 0.27–0.34. Radiologist characteristics did not affect agreement. CONCLUSION: The interpretative performance of the radiologists fulfilled the mammography screening goal of the American College of Radiology, although there was inter-observer variability. Radiologists who interpreted more than 3000 screening mammograms annually tended to perform better than radiologists who did not.
Area Under Curve
;
Breast
;
Fellowships and Scholarships
;
Mammography
;
Mass Screening
;
Medical Audit
;
Observer Variation
;
ROC Curve
;
Sensitivity and Specificity

Result Analysis
Print
Save
E-mail