1.Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison
Mateo Restrepo MEJIA ; Juan Sebastian ARROYAVE ; Michael SATURNO ; Laura Chelsea Mazudie NDJONKO ; Bashar ZAIDAT ; Rami RAJJOUB ; Wasil AHMED ; Ivan ZAPOLSKY ; Samuel K. CHO
Neurospine 2024;21(1):149-158
Objective:
Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.
Methods:
ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.
Results:
Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).
Conclusion
ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.
2.Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison
Mateo Restrepo MEJIA ; Juan Sebastian ARROYAVE ; Michael SATURNO ; Laura Chelsea Mazudie NDJONKO ; Bashar ZAIDAT ; Rami RAJJOUB ; Wasil AHMED ; Ivan ZAPOLSKY ; Samuel K. CHO
Neurospine 2024;21(1):149-158
Objective:
Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.
Methods:
ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.
Results:
Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).
Conclusion
ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.
3.Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison
Mateo Restrepo MEJIA ; Juan Sebastian ARROYAVE ; Michael SATURNO ; Laura Chelsea Mazudie NDJONKO ; Bashar ZAIDAT ; Rami RAJJOUB ; Wasil AHMED ; Ivan ZAPOLSKY ; Samuel K. CHO
Neurospine 2024;21(1):149-158
Objective:
Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.
Methods:
ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.
Results:
Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).
Conclusion
ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.
4.Risk of Incident Cancer in Veterans with Diabetes Who Use Metformin Versus Sulfonylureas
Maya M. ABDALLAH ; Beatriz Desanti de OLIVEIRA ; Clark DUMONTIER ; Ariela R. ORKABY ; Lisa NUSSBAUM ; Michael GAZIANO ; Luc DJOUSSE ; David GAGNON ; Kelly CHO ; Sarah R. PREIS ; Jane A. DRIVER
Journal of Cancer Prevention 2024;29(4):140-147
Prior research suggests metformin has anti-cancer effects, yet data are limited. We examined the association between diabetes treatment (metformin versus sulfonylurea) and risk of incident diabetes-related and non- diabetes-related cancers in US veterans.This retrospective cohort study included US veterans, without cancer, aged ≥ 55 years, who were new users of metformin or sulfo-nylureas for diabetes between 2001 to 2012. Cox proportional hazards models, with propensity score-matched inverse probability of treatment weighting (IPTW) were constructed. A total of 88,713 veterans (mean age 68.6 ± 7.8 years; 97.7% male; 84.1% White, 12.6% Black, 3.3% other race) were followed for 4.2 ± 3.0 years. Among metformin users (n = 60,476), there were 858 incident diabetes-related cancers (crude incidence rate [IR; per 1,000 person-years] = 3.4) and 3,533 non-diabetes-related cancers (IR = 14.1). Among sulfonylurea users (n = 28,237), there were 675 incident diabetes-related cancers (IR = 5.5) and 2,316 non-diabetes-related cancers (IR = 18.9). After IPTW adjustment, metformin use was associated with a lower risk of incident diabetes-related cancer (hazard ratio [HR] = 0.66, 95% CI 0.58-0.75) compared to sulfonylurea use. There was no association between treatment group (metformin versus sulfonylurea) and non-diabetes-related cancer (HR = 0.96, 95% CI 0.89-1.02). Of diabetes-related cancers, metformin users had lower incidence of liver (HR = 0.39, 95% CI 0.28-0.53), colorectal (HR = 0.75, 95% CI 0.62-0.92), and esophageal cancers (HR = 0.54, 95% CI 0.36-0.81). Among US veterans, metformin users had lower incidence of diabetes-related cancer, particularly liver, colorectal, and esophageal cancers, as compared to sulfonylurea users. Use of metformin was not associated with non-diabetes-related cancer. Further studies are needed to understand how metformin use impacts cancer incidence in different patient populations.
5.Risk of Incident Cancer in Veterans with Diabetes Who Use Metformin Versus Sulfonylureas
Maya M. ABDALLAH ; Beatriz Desanti de OLIVEIRA ; Clark DUMONTIER ; Ariela R. ORKABY ; Lisa NUSSBAUM ; Michael GAZIANO ; Luc DJOUSSE ; David GAGNON ; Kelly CHO ; Sarah R. PREIS ; Jane A. DRIVER
Journal of Cancer Prevention 2024;29(4):140-147
Prior research suggests metformin has anti-cancer effects, yet data are limited. We examined the association between diabetes treatment (metformin versus sulfonylurea) and risk of incident diabetes-related and non- diabetes-related cancers in US veterans.This retrospective cohort study included US veterans, without cancer, aged ≥ 55 years, who were new users of metformin or sulfo-nylureas for diabetes between 2001 to 2012. Cox proportional hazards models, with propensity score-matched inverse probability of treatment weighting (IPTW) were constructed. A total of 88,713 veterans (mean age 68.6 ± 7.8 years; 97.7% male; 84.1% White, 12.6% Black, 3.3% other race) were followed for 4.2 ± 3.0 years. Among metformin users (n = 60,476), there were 858 incident diabetes-related cancers (crude incidence rate [IR; per 1,000 person-years] = 3.4) and 3,533 non-diabetes-related cancers (IR = 14.1). Among sulfonylurea users (n = 28,237), there were 675 incident diabetes-related cancers (IR = 5.5) and 2,316 non-diabetes-related cancers (IR = 18.9). After IPTW adjustment, metformin use was associated with a lower risk of incident diabetes-related cancer (hazard ratio [HR] = 0.66, 95% CI 0.58-0.75) compared to sulfonylurea use. There was no association between treatment group (metformin versus sulfonylurea) and non-diabetes-related cancer (HR = 0.96, 95% CI 0.89-1.02). Of diabetes-related cancers, metformin users had lower incidence of liver (HR = 0.39, 95% CI 0.28-0.53), colorectal (HR = 0.75, 95% CI 0.62-0.92), and esophageal cancers (HR = 0.54, 95% CI 0.36-0.81). Among US veterans, metformin users had lower incidence of diabetes-related cancer, particularly liver, colorectal, and esophageal cancers, as compared to sulfonylurea users. Use of metformin was not associated with non-diabetes-related cancer. Further studies are needed to understand how metformin use impacts cancer incidence in different patient populations.
6.Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison
Mateo Restrepo MEJIA ; Juan Sebastian ARROYAVE ; Michael SATURNO ; Laura Chelsea Mazudie NDJONKO ; Bashar ZAIDAT ; Rami RAJJOUB ; Wasil AHMED ; Ivan ZAPOLSKY ; Samuel K. CHO
Neurospine 2024;21(1):149-158
Objective:
Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.
Methods:
ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.
Results:
Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).
Conclusion
ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.
7.Risk of Incident Cancer in Veterans with Diabetes Who Use Metformin Versus Sulfonylureas
Maya M. ABDALLAH ; Beatriz Desanti de OLIVEIRA ; Clark DUMONTIER ; Ariela R. ORKABY ; Lisa NUSSBAUM ; Michael GAZIANO ; Luc DJOUSSE ; David GAGNON ; Kelly CHO ; Sarah R. PREIS ; Jane A. DRIVER
Journal of Cancer Prevention 2024;29(4):140-147
Prior research suggests metformin has anti-cancer effects, yet data are limited. We examined the association between diabetes treatment (metformin versus sulfonylurea) and risk of incident diabetes-related and non- diabetes-related cancers in US veterans.This retrospective cohort study included US veterans, without cancer, aged ≥ 55 years, who were new users of metformin or sulfo-nylureas for diabetes between 2001 to 2012. Cox proportional hazards models, with propensity score-matched inverse probability of treatment weighting (IPTW) were constructed. A total of 88,713 veterans (mean age 68.6 ± 7.8 years; 97.7% male; 84.1% White, 12.6% Black, 3.3% other race) were followed for 4.2 ± 3.0 years. Among metformin users (n = 60,476), there were 858 incident diabetes-related cancers (crude incidence rate [IR; per 1,000 person-years] = 3.4) and 3,533 non-diabetes-related cancers (IR = 14.1). Among sulfonylurea users (n = 28,237), there were 675 incident diabetes-related cancers (IR = 5.5) and 2,316 non-diabetes-related cancers (IR = 18.9). After IPTW adjustment, metformin use was associated with a lower risk of incident diabetes-related cancer (hazard ratio [HR] = 0.66, 95% CI 0.58-0.75) compared to sulfonylurea use. There was no association between treatment group (metformin versus sulfonylurea) and non-diabetes-related cancer (HR = 0.96, 95% CI 0.89-1.02). Of diabetes-related cancers, metformin users had lower incidence of liver (HR = 0.39, 95% CI 0.28-0.53), colorectal (HR = 0.75, 95% CI 0.62-0.92), and esophageal cancers (HR = 0.54, 95% CI 0.36-0.81). Among US veterans, metformin users had lower incidence of diabetes-related cancer, particularly liver, colorectal, and esophageal cancers, as compared to sulfonylurea users. Use of metformin was not associated with non-diabetes-related cancer. Further studies are needed to understand how metformin use impacts cancer incidence in different patient populations.
8.Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison
Mateo Restrepo MEJIA ; Juan Sebastian ARROYAVE ; Michael SATURNO ; Laura Chelsea Mazudie NDJONKO ; Bashar ZAIDAT ; Rami RAJJOUB ; Wasil AHMED ; Ivan ZAPOLSKY ; Samuel K. CHO
Neurospine 2024;21(1):149-158
Objective:
Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.
Methods:
ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories—overconclusiveness, supplementary information, and incompleteness—were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.
Results:
Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).
Conclusion
ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.
9.Consensus and Diversity in the Management of Varicocele for Male Infertility: Results of a Global Practice Survey and Comparison with Guidelines and Recommendations
Rupin SHAH ; Ashok AGARWAL ; Parviz KAVOUSSI ; Amarnath RAMBHATLA ; Ramadan SALEH ; Rossella CANNARELLA ; Ahmed M. HARRAZ ; Florence BOITRELLE ; Shinnosuke KURODA ; Taha Abo-Almagd Abdel-Meguid HAMODA ; Armand ZINI ; Edmund KO ; Gokhan CALIK ; Tuncay TOPRAK ; Hussein KANDIL ; Murat GÜL ; Mustafa Emre BAKIRCIOĞLU ; Neel PAREKH ; Giorgio Ivan RUSSO ; Nicholas TADROS ; Ates KADIOGLU ; Mohamed ARAFA ; Eric CHUNG ; Osvaldo RAJMIL ; Fotios DIMITRIADIS ; Vineet MALHOTRA ; Gianmaria SALVIO ; Ralf HENKEL ; Tan V. LE ; Emrullah SOGUTDELEN ; Sarah VIJ ; Abdullah ALARBID ; Ahmet GUDELOGLU ; Akira TSUJIMURA ; Aldo E. CALOGERO ; Amr El MELIEGY ; Andrea CRAFA ; Arif KALKANLI ; Aykut BASER ; Berk HAZIR ; Carlo GIULIONI ; Chak-Lam CHO ; Christopher C.K. HO ; Ciro SALZANO ; Daniel Suslik ZYLBERSZTEJN ; Dung Mai Ba TIEN ; Edoardo PESCATORI ; Edson BORGES ; Ege Can SEREFOGLU ; Emine SAÏS-HAMZA ; Eric HUYGHE ; Erman CEYHAN ; Ettore CAROPPO ; Fabrizio CASTIGLIONI ; Fahmi BAHAR ; Fatih GOKALP ; Francesco LOMBARDO ; Franco GADDA ; Gede Wirya Kusuma DUARSA ; Germar-Michael PINGGERA ; Gian Maria BUSETTO ; Giancarlo BALERCIA ; Gianmartin CITO ; Gideon BLECHER ; Giorgio FRANCO ; Giovanni LIGUORI ; Haitham ELBARDISI ; Hakan KESKIN ; Haocheng LIN ; Hisanori TANIGUCHI ; Hyun Jun PARK ; Imad ZIOUZIOU ; Jean de la ROSETTE ; Jim HOTALING ; Jonathan RAMSAY ; Juan Manuel Corral MOLINA ; Ka Lun LO ; Kadir BOCU ; Kareim KHALAFALLA ; Kasonde BOWA ; Keisuke OKADA ; Koichi NAGAO ; Koji CHIBA ; Lukman HAKIM ; Konstantinos MAKAROUNIS ; Marah HEHEMANN ; Marcelo Rodriguez PEÑA ; Marco FALCONE ; Marion BENDAYAN ; Marlon MARTINEZ ; Massimiliano TIMPANO
The World Journal of Men's Health 2023;41(1):164-197
Purpose:
Varicocele is a common problem among infertile men. Varicocele repair (VR) is frequently performed to improve semen parameters and the chances of pregnancy. However, there is a lack of consensus about the diagnosis, indications for VR and its outcomes. The aim of this study was to explore global practice patterns on the management of varicocele in the context of male infertility.
Materials and Methods:
Sixty practicing urologists/andrologists from 23 countries contributed 382 multiple-choice-questions pertaining to varicocele management. These were condensed into an online questionnaire that was forwarded to clinicians involved in male infertility management through direct invitation. The results were analyzed for disagreement and agreement in practice patterns and, compared with the latest guidelines of international professional societies (American Urological Association [AUA], American Society for Reproductive Medicine [ASRM], and European Association of Urology [EAU]), and with evidence emerging from recent systematic reviews and meta-analyses. Additionally, an expert opinion on each topic was provided based on the consensus of 16 experts in the field.
Results:
The questionnaire was answered by 574 clinicians from 59 countries. The majority of respondents were urologists/ uro-andrologists. A wide diversity of opinion was seen in every aspect of varicocele diagnosis, indications for repair, choice of technique, management of sub-clinical varicocele and the role of VR in azoospermia. A significant proportion of the responses were at odds with the recommendations of AUA, ASRM, and EAU. A large number of clinical situations were identified where no guidelines are available.
Conclusions
This study is the largest global survey performed to date on the clinical management of varicocele for male infertility. It demonstrates: 1) a wide disagreement in the approach to varicocele management, 2) large gaps in the clinical practice guidelines from professional societies, and 3) the need for further studies on several aspects of varicocele management in infertile men.
10.A single emergency center study for evaluation of P-POSSUM and Mannheim Peritonitis Index as a risk prediction model in patients with non-traumatic peritonitis
Boram KIM ; Seong Hun KIM ; Sung Pil Michael CHOE ; Daihai CHOI ; Dong Wook JE ; Woo Young NHO ; Soo Hyung LEE ; Sunho CHO ; Shinwoo KIM ; Hyoungouk KIM ; Jeong Sik YI
Journal of the Korean Society of Emergency Medicine 2022;33(2):193-202
Objective:
Peritonitis is a life-threatening, emergent surgical disease with very high mortality and morbidity. Currently, there are insufficient Korean studies using the P-POSSUM (Portsmouth-Physiological and Operative Severity Score for the enUmeration of Mortality and morbidity) and the Mannheim Peritonitis Index (MPI) as risk prediction models for nontraumatic peritonitis patients who visit the emergency room.
Methods:
This retrospective study was carried out on 196 cases of non-traumatic peritonitis in a single emergency center from January 2015 to December 2019. Receiver operating characteristic (ROC) curves were obtained and the area under the ROC curve (AUC) was compared using both P-POSSUM and MPI. The observed mortality and expected mortality for P-POSSUM were compared using the goodness of fit assessed using the Hosmer-Lemeshow equation.
Results:
Diastolic blood pressure, blood urea nitrogen, potassium, length of stay, and intensive care unit admissions were significantly different between survivors and non-survivors. The AUC was 0.812 for P-POSSUM and 0.646 for MPI. The observed-to-expected mortality ratio for P-POSSUM indicated fewer than expected deaths in all quintiles of risk and this was more pronounced, especially when the expected mortality was over 60%.
Conclusion
In non-traumatic peritonitis patients, P-POSSUM was more useful in predicting risk than the MPI score. However, P-POSSUM overestimated the risk in high-risk patients. Although the MPI score is only somewhat useful for predicting mortality in patients with non-traumatic peritonitis, it is useful as an adjuvant.

Result Analysis
Print
Save
E-mail