1.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
2.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
3.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
4.Analysis of East Asia subgroup in Study 309/KEYNOTE-775: lenvatinib plus pembrolizumab versus treatment of physician’s choice chemotherapy in patients with previously treated advanced or recurrent endometrial cancer
Kan YONEMORI ; Keiichi FUJIWARA ; Kosei HASEGAWA ; Mayu YUNOKAWA ; Kimio USHIJIMA ; Shiro SUZUKI ; Ayumi SHIKAMA ; Shinichiro MINOBE ; Tomoka USAMI ; Jae-Weon KIM ; Byoung-Gie KIM ; Peng-Hui WANG ; Ting-Chang CHANG ; Keiko YAMAMOTO ; Shirong HAN ; Jodi MCKENZIE ; Robert J. ORLOWSKI ; Takuma MIURA ; Vicky MAKKER ; Yong Man KIM
Journal of Gynecologic Oncology 2024;35(2):e40-
Objective:
In the global phase 3 Study 309/KEYNOTE-775 (NCT03517449) at the first interim analysis, lenvatinib+pembrolizumab significantly improved progression-free survival (PFS), overall survival (OS), and objective response rate (ORR) versus treatment of physician’s choice chemotherapy (TPC) in patients with previously treated advanced/recurrent endometrial cancer (EC). This exploratory analysis evaluated outcomes in patients enrolled in East Asia at the time of prespecified final analysis.
Methods:
Women ≥18 years with histologically confirmed advanced, recurrent, or metastatic EC with progressive disease after 1 platinum-based chemotherapy (2 if 1 given in neoadjuvant/ adjuvant setting) were enrolled. Patients were randomized 1:1 to lenvatinib 20 mg orally once daily plus pembrolizumab 200 mg intravenously every 3 weeks (≤35 cycles) or TPC (doxorubicin or paclitaxel). Primary endpoints were PFS per RECIST v1.1 by blinded independent central review and OS. No alpha was assigned for this subgroup analysis.
Results:
Among 155 East Asian patients (lenvatinib+pembrolizumab, n=77; TPC, n=78), median follow-up time (data cutoff: March 1, 2022) was 34.3 (range, 25.1–43.0) months.Hazard ratios (HRs) with 95% confidence intervals (CIs) for PFS (lenvatinib+pembrolizumab vs. TPC) were 0.74 (0.49–1.10) and 0.64 (0.44–0.94) in the mismatch repair proficient (pMMR) and all-comer populations, respectively. HRs (95% CI) for OS were 0.68 (0.45–1.02) and 0.61 (0.41–0.90), respectively. ORRs were 36% with lenvatinib+pembrolizumab and 22% with TPC (pMMR) and 39% and 21%, respectively (all-comers). Treatment-related adverse events occurred in 97% and 96% (grade 3–5, 74% and 72%), respectively.
Conclusion
Lenvatinib+pembrolizumab provided clinically meaningful benefit with manageable safety compared with TPC, supporting its use in East Asian patients with previously treated advanced/recurrent EC.
5.Analysis of East Asia subgroup in Study 309/KEYNOTE-775: lenvatinib plus pembrolizumab versus treatment of physician’s choice chemotherapy in patients with previously treated advanced or recurrent endometrial cancer
Kan YONEMORI ; Keiichi FUJIWARA ; Kosei HASEGAWA ; Mayu YUNOKAWA ; Kimio USHIJIMA ; Shiro SUZUKI ; Ayumi SHIKAMA ; Shinichiro MINOBE ; Tomoka USAMI ; Jae-Weon KIM ; Byoung-Gie KIM ; Peng-Hui WANG ; Ting-Chang CHANG ; Keiko YAMAMOTO ; Shirong HAN ; Jodi MCKENZIE ; Robert J. ORLOWSKI ; Takuma MIURA ; Vicky MAKKER ; Yong Man KIM
Journal of Gynecologic Oncology 2024;35(2):e40-
Objective:
In the global phase 3 Study 309/KEYNOTE-775 (NCT03517449) at the first interim analysis, lenvatinib+pembrolizumab significantly improved progression-free survival (PFS), overall survival (OS), and objective response rate (ORR) versus treatment of physician’s choice chemotherapy (TPC) in patients with previously treated advanced/recurrent endometrial cancer (EC). This exploratory analysis evaluated outcomes in patients enrolled in East Asia at the time of prespecified final analysis.
Methods:
Women ≥18 years with histologically confirmed advanced, recurrent, or metastatic EC with progressive disease after 1 platinum-based chemotherapy (2 if 1 given in neoadjuvant/ adjuvant setting) were enrolled. Patients were randomized 1:1 to lenvatinib 20 mg orally once daily plus pembrolizumab 200 mg intravenously every 3 weeks (≤35 cycles) or TPC (doxorubicin or paclitaxel). Primary endpoints were PFS per RECIST v1.1 by blinded independent central review and OS. No alpha was assigned for this subgroup analysis.
Results:
Among 155 East Asian patients (lenvatinib+pembrolizumab, n=77; TPC, n=78), median follow-up time (data cutoff: March 1, 2022) was 34.3 (range, 25.1–43.0) months.Hazard ratios (HRs) with 95% confidence intervals (CIs) for PFS (lenvatinib+pembrolizumab vs. TPC) were 0.74 (0.49–1.10) and 0.64 (0.44–0.94) in the mismatch repair proficient (pMMR) and all-comer populations, respectively. HRs (95% CI) for OS were 0.68 (0.45–1.02) and 0.61 (0.41–0.90), respectively. ORRs were 36% with lenvatinib+pembrolizumab and 22% with TPC (pMMR) and 39% and 21%, respectively (all-comers). Treatment-related adverse events occurred in 97% and 96% (grade 3–5, 74% and 72%), respectively.
Conclusion
Lenvatinib+pembrolizumab provided clinically meaningful benefit with manageable safety compared with TPC, supporting its use in East Asian patients with previously treated advanced/recurrent EC.
6.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
7.Analysis of East Asia subgroup in Study 309/KEYNOTE-775: lenvatinib plus pembrolizumab versus treatment of physician’s choice chemotherapy in patients with previously treated advanced or recurrent endometrial cancer
Kan YONEMORI ; Keiichi FUJIWARA ; Kosei HASEGAWA ; Mayu YUNOKAWA ; Kimio USHIJIMA ; Shiro SUZUKI ; Ayumi SHIKAMA ; Shinichiro MINOBE ; Tomoka USAMI ; Jae-Weon KIM ; Byoung-Gie KIM ; Peng-Hui WANG ; Ting-Chang CHANG ; Keiko YAMAMOTO ; Shirong HAN ; Jodi MCKENZIE ; Robert J. ORLOWSKI ; Takuma MIURA ; Vicky MAKKER ; Yong Man KIM
Journal of Gynecologic Oncology 2024;35(2):e40-
Objective:
In the global phase 3 Study 309/KEYNOTE-775 (NCT03517449) at the first interim analysis, lenvatinib+pembrolizumab significantly improved progression-free survival (PFS), overall survival (OS), and objective response rate (ORR) versus treatment of physician’s choice chemotherapy (TPC) in patients with previously treated advanced/recurrent endometrial cancer (EC). This exploratory analysis evaluated outcomes in patients enrolled in East Asia at the time of prespecified final analysis.
Methods:
Women ≥18 years with histologically confirmed advanced, recurrent, or metastatic EC with progressive disease after 1 platinum-based chemotherapy (2 if 1 given in neoadjuvant/ adjuvant setting) were enrolled. Patients were randomized 1:1 to lenvatinib 20 mg orally once daily plus pembrolizumab 200 mg intravenously every 3 weeks (≤35 cycles) or TPC (doxorubicin or paclitaxel). Primary endpoints were PFS per RECIST v1.1 by blinded independent central review and OS. No alpha was assigned for this subgroup analysis.
Results:
Among 155 East Asian patients (lenvatinib+pembrolizumab, n=77; TPC, n=78), median follow-up time (data cutoff: March 1, 2022) was 34.3 (range, 25.1–43.0) months.Hazard ratios (HRs) with 95% confidence intervals (CIs) for PFS (lenvatinib+pembrolizumab vs. TPC) were 0.74 (0.49–1.10) and 0.64 (0.44–0.94) in the mismatch repair proficient (pMMR) and all-comer populations, respectively. HRs (95% CI) for OS were 0.68 (0.45–1.02) and 0.61 (0.41–0.90), respectively. ORRs were 36% with lenvatinib+pembrolizumab and 22% with TPC (pMMR) and 39% and 21%, respectively (all-comers). Treatment-related adverse events occurred in 97% and 96% (grade 3–5, 74% and 72%), respectively.
Conclusion
Lenvatinib+pembrolizumab provided clinically meaningful benefit with manageable safety compared with TPC, supporting its use in East Asian patients with previously treated advanced/recurrent EC.
8.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
9.Brain Derived Neurotrophic Factor Methylation and Long-term Outcomes after Stroke Interacting with Suicidal Ideation
Hee-Ju KANG ; Ju-Wan KIM ; Joon-Tae KIM ; Man-Seok PARK ; Byung Jo CHUN ; Sung-Wan KIM ; Il-Seon SHIN ; Robert STEWART ; Jae-Min KIM
Clinical Psychopharmacology and Neuroscience 2024;22(2):306-313
Objective:
This study aimed to evaluate the unexplored relationship between BDNF methylation, long-term outcomes, and its interaction with suicidal ideation (SI), which is closely associated with both BDNF expression and stroke outcomes.
Methods:
A total of 278 stroke patients were assessed for BDNF methylation status and SI using suicide-related item in the Montgomery–Åsberg Depression Rating Scale at 2 weeks post-stroke. We investigated the incidence of composite cerebro-cardiovascular events (CCVEs) during an 8−14-year period after the initial stroke as long-term stroke outcome.We conducted Cox regression models adjusted for covariates to evaluate the association between BDNF methylation status and CCVEs, as well as its interaction with post-stroke SI at 2 weeks.
Results:
Higher methylation status of CpG 1, 3, and 5, but not the average value, predicted a greater number of composite CCVEs during 8−14 years following the stroke. The associations between a higher methylation status of CpGs 1, 3, 5, and 8, as well as the average BDNF methylation value, and a greater number of composite CCVEs, were prominent in patients who had post-stroke SI at 2 weeks. Notably, a significant interaction between methylation status and SI on composite CCVEs was observed only for CpG 8.
Conclusion
The significant association between BDNF methylation and poor long-term stroke outcomes, particularly amplified in individuals who had post-stroke SI at 2 weeks, suggested that evaluating the biological marker status of BDNF methylation along with assessing SI during the acute phase of stroke can help predict long-term outcomes.
10.Five-Fraction High-Conformal Ultrahypofractionated Radiotherapy for Primary Tumors in Metastatic Breast Cancer
Jeongshim LEE ; Jee Hung KIM ; Mitchell LIU ; Andrew BANG ; Robert OLSON ; Jee Suk CHANG
Journal of Breast Cancer 2024;27(2):91-104
Purpose:
To report on the local control and toxicity of 5-fraction, high-conformal ultrafractionated radiation therapy (RT) for primary tumors in patients with metastatic breast cancer (MBC) who did not undergo planned surgical intervention.
Methods:
We retrospectively reviewed 27 patients with MBC who underwent 5-fraction high-dose ultrafractionated intensity-modulated RT for their primary tumors between 2017 and 2022 at our institution. A median dose of 66.8 Gy (range, 51.8–83.6 Gy) was prescribed to the gross tumor, calculated in 2-Gy equivalents using an α/β ratio of 3.5, along with a simultaneous integrated boost of 81.5%. The primary endpoint of this study was local control.
Results:
The median tumor size and volume were 5.1 cm and 112.4 cm3 , respectively. Treatment was generally well tolerated, with only 15% of the patients experiencing mild acute skin toxicity, which resolved spontaneously. The best infield response rate was 82%, with the objective response observed at a median time of 10.8 months post-RT (range, 1.4–29.2), until local progression or the last follow-up. At a median follow-up of 18.3 months, the 2-year local control rate was 77%. A higher number of prior lines of systemic therapy was significantly associated with poorer 2-year local control (one–two lines, 94% vs three or more lines, 34%; p = 0.004). Post-RT, 67% of the patients transitioned to the next line of systemic therapy, and the median duration of maintaining the same systemic therapy post-RT was 16.3 months (range, 1.9–40.3).
Conclusion
In our small dataset, 5-fraction, high-conformal ultrahypofractionated breast RT offered promising 2-year local control with minimal toxicity. Further studies are warranted to investigate the optimal dose and role in this setting.

Result Analysis
Print
Save
E-mail