1.Using Machine Learning Models to Identify Factors Associated With 30-Day Readmissions After Posterior Cervical Fusions: A Longitudinal Cohort Study
Aneysis D. GONZALEZ-SUAREZ ; Paymon G. REZAII ; Daniel HERRICK ; Seth Stravers TIGCHELAAR ; John K. RATLIFF ; Mirabela RUSU ; David SCHEINKER ; Ikchan JEON ; Atman M. DESAI
Neurospine 2024;21(2):620-632
Objective:
Readmission rates after posterior cervical fusion (PCF) significantly impact patients and healthcare, with complication rates at 15%–25% and up to 12% 90-day readmission rates. In this study, we aim to test whether machine learning (ML) models that capture interfactorial interactions outperform traditional logistic regression (LR) in identifying readmission-associated factors.
Methods:
The Optum Clinformatics Data Mart database was used to identify patients who underwent PCF between 2004–2017. To determine factors associated with 30-day readmissions, 5 ML models were generated and evaluated, including a multivariate LR (MLR) model. Then, the best-performing model, Gradient Boosting Machine (GBM), was compared to the LACE (Length patient stay in the hospital, Acuity of admission of patient in the hospital, Comorbidity, and Emergency visit) index regarding potential cost savings from algorithm implementation.
Results:
This study included 4,130 patients, 874 of which were readmitted within 30 days. When analyzed and scaled, we found that patient discharge status, comorbidities, and number of procedure codes were factors that influenced MLR, while patient discharge status, billed admission charge, and length of stay influenced the GBM model. The GBM model significantly outperformed MLR in predicting unplanned readmissions (mean area under the receiver operating characteristic curve, 0.846 vs. 0.829; p < 0.001), while also projecting an average cost savings of 50% more than the LACE index.
Conclusion
Five models (GBM, XGBoost [extreme gradient boosting], RF [random forest], LASSO [least absolute shrinkage and selection operator], and MLR) were evaluated, among which, the GBM model exhibited superior predictive performance, robustness, and accuracy. Factors associated with readmissions impact LR and GBM models differently, suggesting that these models can be used complementarily. When analyzing PCF procedures, the GBM model resulted in greater predictive performance and was associated with higher theoretical cost savings for readmissions associated with PCF complications.
2.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
3.Using Machine Learning Models to Identify Factors Associated With 30-Day Readmissions After Posterior Cervical Fusions: A Longitudinal Cohort Study
Aneysis D. GONZALEZ-SUAREZ ; Paymon G. REZAII ; Daniel HERRICK ; Seth Stravers TIGCHELAAR ; John K. RATLIFF ; Mirabela RUSU ; David SCHEINKER ; Ikchan JEON ; Atman M. DESAI
Neurospine 2024;21(2):620-632
Objective:
Readmission rates after posterior cervical fusion (PCF) significantly impact patients and healthcare, with complication rates at 15%–25% and up to 12% 90-day readmission rates. In this study, we aim to test whether machine learning (ML) models that capture interfactorial interactions outperform traditional logistic regression (LR) in identifying readmission-associated factors.
Methods:
The Optum Clinformatics Data Mart database was used to identify patients who underwent PCF between 2004–2017. To determine factors associated with 30-day readmissions, 5 ML models were generated and evaluated, including a multivariate LR (MLR) model. Then, the best-performing model, Gradient Boosting Machine (GBM), was compared to the LACE (Length patient stay in the hospital, Acuity of admission of patient in the hospital, Comorbidity, and Emergency visit) index regarding potential cost savings from algorithm implementation.
Results:
This study included 4,130 patients, 874 of which were readmitted within 30 days. When analyzed and scaled, we found that patient discharge status, comorbidities, and number of procedure codes were factors that influenced MLR, while patient discharge status, billed admission charge, and length of stay influenced the GBM model. The GBM model significantly outperformed MLR in predicting unplanned readmissions (mean area under the receiver operating characteristic curve, 0.846 vs. 0.829; p < 0.001), while also projecting an average cost savings of 50% more than the LACE index.
Conclusion
Five models (GBM, XGBoost [extreme gradient boosting], RF [random forest], LASSO [least absolute shrinkage and selection operator], and MLR) were evaluated, among which, the GBM model exhibited superior predictive performance, robustness, and accuracy. Factors associated with readmissions impact LR and GBM models differently, suggesting that these models can be used complementarily. When analyzing PCF procedures, the GBM model resulted in greater predictive performance and was associated with higher theoretical cost savings for readmissions associated with PCF complications.
4.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
5.Using Machine Learning Models to Identify Factors Associated With 30-Day Readmissions After Posterior Cervical Fusions: A Longitudinal Cohort Study
Aneysis D. GONZALEZ-SUAREZ ; Paymon G. REZAII ; Daniel HERRICK ; Seth Stravers TIGCHELAAR ; John K. RATLIFF ; Mirabela RUSU ; David SCHEINKER ; Ikchan JEON ; Atman M. DESAI
Neurospine 2024;21(2):620-632
Objective:
Readmission rates after posterior cervical fusion (PCF) significantly impact patients and healthcare, with complication rates at 15%–25% and up to 12% 90-day readmission rates. In this study, we aim to test whether machine learning (ML) models that capture interfactorial interactions outperform traditional logistic regression (LR) in identifying readmission-associated factors.
Methods:
The Optum Clinformatics Data Mart database was used to identify patients who underwent PCF between 2004–2017. To determine factors associated with 30-day readmissions, 5 ML models were generated and evaluated, including a multivariate LR (MLR) model. Then, the best-performing model, Gradient Boosting Machine (GBM), was compared to the LACE (Length patient stay in the hospital, Acuity of admission of patient in the hospital, Comorbidity, and Emergency visit) index regarding potential cost savings from algorithm implementation.
Results:
This study included 4,130 patients, 874 of which were readmitted within 30 days. When analyzed and scaled, we found that patient discharge status, comorbidities, and number of procedure codes were factors that influenced MLR, while patient discharge status, billed admission charge, and length of stay influenced the GBM model. The GBM model significantly outperformed MLR in predicting unplanned readmissions (mean area under the receiver operating characteristic curve, 0.846 vs. 0.829; p < 0.001), while also projecting an average cost savings of 50% more than the LACE index.
Conclusion
Five models (GBM, XGBoost [extreme gradient boosting], RF [random forest], LASSO [least absolute shrinkage and selection operator], and MLR) were evaluated, among which, the GBM model exhibited superior predictive performance, robustness, and accuracy. Factors associated with readmissions impact LR and GBM models differently, suggesting that these models can be used complementarily. When analyzing PCF procedures, the GBM model resulted in greater predictive performance and was associated with higher theoretical cost savings for readmissions associated with PCF complications.
6.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
7.Using Machine Learning Models to Identify Factors Associated With 30-Day Readmissions After Posterior Cervical Fusions: A Longitudinal Cohort Study
Aneysis D. GONZALEZ-SUAREZ ; Paymon G. REZAII ; Daniel HERRICK ; Seth Stravers TIGCHELAAR ; John K. RATLIFF ; Mirabela RUSU ; David SCHEINKER ; Ikchan JEON ; Atman M. DESAI
Neurospine 2024;21(2):620-632
Objective:
Readmission rates after posterior cervical fusion (PCF) significantly impact patients and healthcare, with complication rates at 15%–25% and up to 12% 90-day readmission rates. In this study, we aim to test whether machine learning (ML) models that capture interfactorial interactions outperform traditional logistic regression (LR) in identifying readmission-associated factors.
Methods:
The Optum Clinformatics Data Mart database was used to identify patients who underwent PCF between 2004–2017. To determine factors associated with 30-day readmissions, 5 ML models were generated and evaluated, including a multivariate LR (MLR) model. Then, the best-performing model, Gradient Boosting Machine (GBM), was compared to the LACE (Length patient stay in the hospital, Acuity of admission of patient in the hospital, Comorbidity, and Emergency visit) index regarding potential cost savings from algorithm implementation.
Results:
This study included 4,130 patients, 874 of which were readmitted within 30 days. When analyzed and scaled, we found that patient discharge status, comorbidities, and number of procedure codes were factors that influenced MLR, while patient discharge status, billed admission charge, and length of stay influenced the GBM model. The GBM model significantly outperformed MLR in predicting unplanned readmissions (mean area under the receiver operating characteristic curve, 0.846 vs. 0.829; p < 0.001), while also projecting an average cost savings of 50% more than the LACE index.
Conclusion
Five models (GBM, XGBoost [extreme gradient boosting], RF [random forest], LASSO [least absolute shrinkage and selection operator], and MLR) were evaluated, among which, the GBM model exhibited superior predictive performance, robustness, and accuracy. Factors associated with readmissions impact LR and GBM models differently, suggesting that these models can be used complementarily. When analyzing PCF procedures, the GBM model resulted in greater predictive performance and was associated with higher theoretical cost savings for readmissions associated with PCF complications.
8.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
9.Using Machine Learning Models to Identify Factors Associated With 30-Day Readmissions After Posterior Cervical Fusions: A Longitudinal Cohort Study
Aneysis D. GONZALEZ-SUAREZ ; Paymon G. REZAII ; Daniel HERRICK ; Seth Stravers TIGCHELAAR ; John K. RATLIFF ; Mirabela RUSU ; David SCHEINKER ; Ikchan JEON ; Atman M. DESAI
Neurospine 2024;21(2):620-632
Objective:
Readmission rates after posterior cervical fusion (PCF) significantly impact patients and healthcare, with complication rates at 15%–25% and up to 12% 90-day readmission rates. In this study, we aim to test whether machine learning (ML) models that capture interfactorial interactions outperform traditional logistic regression (LR) in identifying readmission-associated factors.
Methods:
The Optum Clinformatics Data Mart database was used to identify patients who underwent PCF between 2004–2017. To determine factors associated with 30-day readmissions, 5 ML models were generated and evaluated, including a multivariate LR (MLR) model. Then, the best-performing model, Gradient Boosting Machine (GBM), was compared to the LACE (Length patient stay in the hospital, Acuity of admission of patient in the hospital, Comorbidity, and Emergency visit) index regarding potential cost savings from algorithm implementation.
Results:
This study included 4,130 patients, 874 of which were readmitted within 30 days. When analyzed and scaled, we found that patient discharge status, comorbidities, and number of procedure codes were factors that influenced MLR, while patient discharge status, billed admission charge, and length of stay influenced the GBM model. The GBM model significantly outperformed MLR in predicting unplanned readmissions (mean area under the receiver operating characteristic curve, 0.846 vs. 0.829; p < 0.001), while also projecting an average cost savings of 50% more than the LACE index.
Conclusion
Five models (GBM, XGBoost [extreme gradient boosting], RF [random forest], LASSO [least absolute shrinkage and selection operator], and MLR) were evaluated, among which, the GBM model exhibited superior predictive performance, robustness, and accuracy. Factors associated with readmissions impact LR and GBM models differently, suggesting that these models can be used complementarily. When analyzing PCF procedures, the GBM model resulted in greater predictive performance and was associated with higher theoretical cost savings for readmissions associated with PCF complications.
10.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.