1.Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study
Max Samuel YUDOVICH ; Elizaveta MAKAROVA ; Christian Michael HAGUE ; Jay Dilip RAMAN
Journal of Educational Evaluation for Health Professions 2024;21(1):17-
Purpose:
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
Methods:
In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Results:
GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P<0.00001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
Conclusions
ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology’s Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.
2.Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study
Max Samuel YUDOVICH ; Elizaveta MAKAROVA ; Christian Michael HAGUE ; Jay Dilip RAMAN
Journal of Educational Evaluation for Health Professions 2024;21(1):17-
Purpose:
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
Methods:
In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Results:
GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P<0.00001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
Conclusions
ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology’s Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.
3.Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study
Max Samuel YUDOVICH ; Elizaveta MAKAROVA ; Christian Michael HAGUE ; Jay Dilip RAMAN
Journal of Educational Evaluation for Health Professions 2024;21(1):17-
Purpose:
This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States.
Methods:
In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024.
Results:
GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P<0.00001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items.
Conclusions
ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology’s Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.
4.House Dust Mites in Human ear
Alazzawi, S., Lynn, E.L.Y., Wee, C.A. and Raman, R.
Tropical Biomedicine 2016;33(2):393-395
A study was carried out to investigate the presence of mites in human ear in 58
patients (113 ears). Ear scrapings were examined under the microscope by a parasitologist
for the presence of house dust mites. Results showed the presence of house dust mites in 8
(7.1%) ears. We can conclude that mites are normal commensals of the external ears in
tropical countries.
5.Holoprosencephaly: an antenatally-diagnosed case series and subject review.
Alvin S T LIM ; Tse Hui LIM ; Su Keyau KEE ; Patrick CHIA ; Subramaniam RAMAN ; Elizabeth L P EU ; Jessie Y C LIM ; Sim Leng TIEN
Annals of the Academy of Medicine, Singapore 2008;37(7):594-597
INTRODUCTIONHoloprosencephaly (HPE) is an uncommon congenital failure of forebrain development. Although the aetiology is heterogeneous, chromosomal abnormalities or a monogenic defect are the major causes, accounting for about 40% to 50% of HPE cases. At least 7 genes have been positively implicated, including SHH, ZIC2, SIX3, TGIF, PTCH1, GLI2, and TDGF1.
CLINICAL PICTURETwelve antenatally- and 1 postnatally-diagnosed cases are presented in this study. These comprised 6 amniotic fluid, 3 chorionic villus, 2 fetal blood, 1 peripheral blood, and 1 product of conception.
OUTCOMEThe total chromosome abnormality rate was 92.3%, comprising predominantly trisomy 13 (66.7%). There was 1 case of trisomy 18, and 3 cases of structural abnormalities, including del13q, del18p, and add4q.
CONCLUSIONDespite the poor outcome of an antenatally-diagnosed HPE and the likely decision by parents to opt for a termination of pregnancy, karyotyping and/or genetic studies should be performed to determine if a specific familial genetic or chromosomal abnormality is the cause. At the very least, a detailed chromosome analysis should be carried out on the affected individual. If the result of high resolution karyotyping is normal, Fluorescence in situ hybridisation (FISH) and/or syndrome-specific testing or isolated holoprosencephaly genetic testing may be performed. This information can be useful in making a prognosis and predicting the risk of recurrence.
Adult ; Chromosome Aberrations ; Female ; Holoprosencephaly ; diagnosis ; genetics ; Humans ; Karyotyping ; Pregnancy ; Prenatal Diagnosis ; Trisomy
6.Sengstaken-Blakemore tube to control massive postpartum haemorrhage.
The Medical Journal of Malaysia 2003;58(4):604-607
Massive postpartum haemorrhage after Cesarean section for placenta previa is a common occurrence. The bleeding is usually from the placental bed at the lower uterine segment. Uterine tamponade has a role in the management of such patients especially when fertility is desired. We describe here a case of massive postpartum haemorrhage, which was managed, with the use of a Sengstaken-Blakemore tube. This allowed us to avoid a hysterectomy for a young primiparous patient.
Balloon Dilatation/*instrumentation
;
Cesarean Section/adverse effects
;
Postpartum Hemorrhage/etiology
;
Postpartum Hemorrhage/*therapy

Result Analysis
Print
Save
E-mail