1.Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery
Bashar ZAIDAT ; Nancy SHRESTHA ; Ashley M. ROSENBERG ; Wasil AHMED ; Rami RAJJOUB ; Timothy HOANG ; Mateo Restrepo MEJIA ; Akiro H. DUEY ; Justin E. TANG ; Jun S. KIM ; Samuel K. CHO
Neurospine 2024;21(1):128-146
Objective:
Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines.
Methods:
ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy.
Results:
Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response.
Conclusion
ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.
2.Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery
Bashar ZAIDAT ; Nancy SHRESTHA ; Ashley M. ROSENBERG ; Wasil AHMED ; Rami RAJJOUB ; Timothy HOANG ; Mateo Restrepo MEJIA ; Akiro H. DUEY ; Justin E. TANG ; Jun S. KIM ; Samuel K. CHO
Neurospine 2024;21(1):128-146
Objective:
Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines.
Methods:
ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy.
Results:
Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response.
Conclusion
ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.
3.Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery
Bashar ZAIDAT ; Nancy SHRESTHA ; Ashley M. ROSENBERG ; Wasil AHMED ; Rami RAJJOUB ; Timothy HOANG ; Mateo Restrepo MEJIA ; Akiro H. DUEY ; Justin E. TANG ; Jun S. KIM ; Samuel K. CHO
Neurospine 2024;21(1):128-146
Objective:
Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines.
Methods:
ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy.
Results:
Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response.
Conclusion
ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.
4.Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery
Bashar ZAIDAT ; Nancy SHRESTHA ; Ashley M. ROSENBERG ; Wasil AHMED ; Rami RAJJOUB ; Timothy HOANG ; Mateo Restrepo MEJIA ; Akiro H. DUEY ; Justin E. TANG ; Jun S. KIM ; Samuel K. CHO
Neurospine 2024;21(1):128-146
Objective:
Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines.
Methods:
ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy.
Results:
Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response.
Conclusion
ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.
5.Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery
Bashar ZAIDAT ; Nancy SHRESTHA ; Ashley M. ROSENBERG ; Wasil AHMED ; Rami RAJJOUB ; Timothy HOANG ; Mateo Restrepo MEJIA ; Akiro H. DUEY ; Justin E. TANG ; Jun S. KIM ; Samuel K. CHO
Neurospine 2024;21(1):128-146
Objective:
Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines.
Methods:
ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy.
Results:
Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response.
Conclusion
ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.
6.Does humeral fixation technique affect long-term outcomes of total shoulder arthroplasty?
Troy LI ; Kenneth H. LEVY ; Akiro H. DUEY ; Akshar V. PATEL ; Christopher A. WHITE ; Carl M. CIRINO ; Alexis WILLIAMS ; Kathryn WHITELAW ; Dave SHUKLA ; Bradford O. PARSONS ; Evan L. FLATOW ; Paul J. CAGLE
Clinics in Shoulder and Elbow 2023;26(3):245-251
Background:
For anatomic total arthroscopic repair, cementless humeral fixation has recently gained popularity. However, few studies have compared clinical, radiographic, and patient-reported outcomes between cemented and press-fit humeral fixation, and none have performed follow-up for longer than 5 years. In this study, we compared long-term postoperative outcomes in patients receiving a cemented versus press-fit humeral stem anatomic arthroscopic repair.
Methods:
This study retrospectively analyzed 169 shoulders that required primary anatomic total shoulder arthroplasty (aTSA). Shoulders were stratified by humeral stem fixation technique: cementation or press-fit. Data were collected pre- and postoperatively. Primary outcome measures included range of motion, patient reported outcomes, and radiographic measures.
Results:
One hundred thirty-eight cemented humeral stems and 31 press-fit stems were included. Significant improvements in range of motion were seen in all aTSA patients with no significant differences between final cemented and press-fit stems (forward elevation: P=0.12, external rotation: P=0.60, and internal rotation: P=0.77). Patient reported outcome metrics also exhibited sustained improvement through final follow-up. However, at final follow-up, the press-fit stem cohort had significantly better overall scores when compared to the cemented cohort (visual analog score: P=0.04, American Shoulder and Elbow Surgeon Score: P<0.01, Simple Shoulder Test score: P=0.03). Humeral radiolucency was noted in two cemented implants and one press-fit implant. No significant differences in implant survival were observed between the two cohorts (P=0.75).
Conclusions
In this series, we found that irrespective of humeral fixation technique, aTSA significantly improves shoulder function. However, within this cohort, press-fit stems provided significantly better outcomes than cemented stems in terms of patient reported outcome scores.Level of evidence: III.
7.Evaluating the effects of age on the long-term functional outcomes following anatomic total shoulder arthroplasty
Troy LI ; Akiro H. DUEY ; Christopher A. WHITE ; Amit PUJARI ; Akshar V. PATEL ; Bashar ZAIDAT ; Christine S. WILLIAMS ; Alexis WILLIAMS ; Carl M. CIRINO ; Dave SHUKLA ; Bradford O. PARSONS ; Evan L. FLATOW ; Paul J. CAGLE
Clinics in Shoulder and Elbow 2023;26(3):231-237
Methods:
Among the patients who underwent TSA, 119 shoulders were retrospectively analyzed. Preoperative and postoperative clinical outcome data were collected. Linear regression analysis (univariate and multivariate) was conducted to evaluate the associations of clinical outcomes with age. Kaplan-Meier curves and Cox regression analyses were performed to evaluate implant survival.
Results:
At final follow-up, patients of all ages undergoing aTSA experienced significant and sustained improvements in all primary outcome measures compared with preoperative values. Based on multivariate analysis, age at the time of surgery was a significant predictor of postoperative outcomes. Excellent implant survival was observed over the course of this study, and Cox regression survival analysis indicated age and sex to not be associated with an increased risk of implant failure.
Conclusions
When controlling for sex and follow-up duration, older age was associated with significantly better patient-reported outcome measures. Despite this difference, we noted no significant effects on range of motion or implant survival.Level of evidence: IV.