1.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
2.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
3.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
4.Current Pediatric Endoscopy Training Situation in the Asia-Pacific Region:A Collaborative Survey by the Asian Pan-Pacific Society for Pediatric Gastroenterology, Hepatology and Nutrition Endoscopy Scientific Subcommittee
Nuthapong UKARAPOL ; Narumon TANATIP ; Ajay SHARMA ; Maribel VITUG-SALES ; Robert Nicholas LOPEZ ; Rohan MALIK ; Ruey Terng NG ; Shuichiro UMETSU ; Songpon GETSUWAN ; Tak Yau Stephen LUI ; Yao-Jong YANG ; Yeoun Joo LEE ; Katsuhiro ARAI ; Kyung Mo KIM ;
Pediatric Gastroenterology, Hepatology & Nutrition 2024;27(4):258-265
Purpose:
To date, there is no region-specific guideline for pediatric endoscopy training. This study aimed to illustrate the current status of pediatric endoscopy training in Asia-Pacific region and identify opportunities for improvement.
Methods:
A cross-sectional survey, using a standardized electronic questionnaire, was conducted among medical schools in the Asia-Pacific region in January 2024.
Results:
A total of 57 medical centers in 12 countries offering formal Pediatric Gastroenterology training programs participated in this regional survey. More than 75% of the centers had an average case load of <10 cases per week for both diagnostic and therapeutic endoscopies. Only 36% of the study programs employed competency-based outcomes for program development, whereas nearly half (48%) used volume-based curricula.Foreign body retrieval, polypectomy, percutaneous endoscopic gastrostomy, and esophageal variceal hemostasis, that is, sclerotherapy or band ligation (endoscopic variceal sclerotherapy and endoscopic variceal ligation), comprised the top four priorities that the trainees should acquire in the autonomous stage (unconscious) of competence. Regarding the learning environment, only 31.5% provided formal hands-on workshops/simulation training. The direct observation of procedural skills was the most commonly used assessment method. The application of a quality assurance (QA) system in both educational and patient care (Pediatric Endoscopy Quality Improvement Network) aspects was present in only 28% and 17% of the centers, respectively.
Conclusion
Compared with Western academic societies, the limited availability of cases remains a major concern. To close this gap, simulation and adult endoscopy training are essential. The implementation of reliable and valid assessment tools and QA systems can lead to significant development in future programs.
5.Analysis of East Asia subgroup in Study 309/KEYNOTE-775: lenvatinib plus pembrolizumab versus treatment of physician’s choice chemotherapy in patients with previously treated advanced or recurrent endometrial cancer
Kan YONEMORI ; Keiichi FUJIWARA ; Kosei HASEGAWA ; Mayu YUNOKAWA ; Kimio USHIJIMA ; Shiro SUZUKI ; Ayumi SHIKAMA ; Shinichiro MINOBE ; Tomoka USAMI ; Jae-Weon KIM ; Byoung-Gie KIM ; Peng-Hui WANG ; Ting-Chang CHANG ; Keiko YAMAMOTO ; Shirong HAN ; Jodi MCKENZIE ; Robert J. ORLOWSKI ; Takuma MIURA ; Vicky MAKKER ; Yong Man KIM
Journal of Gynecologic Oncology 2024;35(2):e40-
Objective:
In the global phase 3 Study 309/KEYNOTE-775 (NCT03517449) at the first interim analysis, lenvatinib+pembrolizumab significantly improved progression-free survival (PFS), overall survival (OS), and objective response rate (ORR) versus treatment of physician’s choice chemotherapy (TPC) in patients with previously treated advanced/recurrent endometrial cancer (EC). This exploratory analysis evaluated outcomes in patients enrolled in East Asia at the time of prespecified final analysis.
Methods:
Women ≥18 years with histologically confirmed advanced, recurrent, or metastatic EC with progressive disease after 1 platinum-based chemotherapy (2 if 1 given in neoadjuvant/ adjuvant setting) were enrolled. Patients were randomized 1:1 to lenvatinib 20 mg orally once daily plus pembrolizumab 200 mg intravenously every 3 weeks (≤35 cycles) or TPC (doxorubicin or paclitaxel). Primary endpoints were PFS per RECIST v1.1 by blinded independent central review and OS. No alpha was assigned for this subgroup analysis.
Results:
Among 155 East Asian patients (lenvatinib+pembrolizumab, n=77; TPC, n=78), median follow-up time (data cutoff: March 1, 2022) was 34.3 (range, 25.1–43.0) months.Hazard ratios (HRs) with 95% confidence intervals (CIs) for PFS (lenvatinib+pembrolizumab vs. TPC) were 0.74 (0.49–1.10) and 0.64 (0.44–0.94) in the mismatch repair proficient (pMMR) and all-comer populations, respectively. HRs (95% CI) for OS were 0.68 (0.45–1.02) and 0.61 (0.41–0.90), respectively. ORRs were 36% with lenvatinib+pembrolizumab and 22% with TPC (pMMR) and 39% and 21%, respectively (all-comers). Treatment-related adverse events occurred in 97% and 96% (grade 3–5, 74% and 72%), respectively.
Conclusion
Lenvatinib+pembrolizumab provided clinically meaningful benefit with manageable safety compared with TPC, supporting its use in East Asian patients with previously treated advanced/recurrent EC.
6.Analysis of East Asia subgroup in Study 309/KEYNOTE-775: lenvatinib plus pembrolizumab versus treatment of physician’s choice chemotherapy in patients with previously treated advanced or recurrent endometrial cancer
Kan YONEMORI ; Keiichi FUJIWARA ; Kosei HASEGAWA ; Mayu YUNOKAWA ; Kimio USHIJIMA ; Shiro SUZUKI ; Ayumi SHIKAMA ; Shinichiro MINOBE ; Tomoka USAMI ; Jae-Weon KIM ; Byoung-Gie KIM ; Peng-Hui WANG ; Ting-Chang CHANG ; Keiko YAMAMOTO ; Shirong HAN ; Jodi MCKENZIE ; Robert J. ORLOWSKI ; Takuma MIURA ; Vicky MAKKER ; Yong Man KIM
Journal of Gynecologic Oncology 2024;35(2):e40-
Objective:
In the global phase 3 Study 309/KEYNOTE-775 (NCT03517449) at the first interim analysis, lenvatinib+pembrolizumab significantly improved progression-free survival (PFS), overall survival (OS), and objective response rate (ORR) versus treatment of physician’s choice chemotherapy (TPC) in patients with previously treated advanced/recurrent endometrial cancer (EC). This exploratory analysis evaluated outcomes in patients enrolled in East Asia at the time of prespecified final analysis.
Methods:
Women ≥18 years with histologically confirmed advanced, recurrent, or metastatic EC with progressive disease after 1 platinum-based chemotherapy (2 if 1 given in neoadjuvant/ adjuvant setting) were enrolled. Patients were randomized 1:1 to lenvatinib 20 mg orally once daily plus pembrolizumab 200 mg intravenously every 3 weeks (≤35 cycles) or TPC (doxorubicin or paclitaxel). Primary endpoints were PFS per RECIST v1.1 by blinded independent central review and OS. No alpha was assigned for this subgroup analysis.
Results:
Among 155 East Asian patients (lenvatinib+pembrolizumab, n=77; TPC, n=78), median follow-up time (data cutoff: March 1, 2022) was 34.3 (range, 25.1–43.0) months.Hazard ratios (HRs) with 95% confidence intervals (CIs) for PFS (lenvatinib+pembrolizumab vs. TPC) were 0.74 (0.49–1.10) and 0.64 (0.44–0.94) in the mismatch repair proficient (pMMR) and all-comer populations, respectively. HRs (95% CI) for OS were 0.68 (0.45–1.02) and 0.61 (0.41–0.90), respectively. ORRs were 36% with lenvatinib+pembrolizumab and 22% with TPC (pMMR) and 39% and 21%, respectively (all-comers). Treatment-related adverse events occurred in 97% and 96% (grade 3–5, 74% and 72%), respectively.
Conclusion
Lenvatinib+pembrolizumab provided clinically meaningful benefit with manageable safety compared with TPC, supporting its use in East Asian patients with previously treated advanced/recurrent EC.
7.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
8.Analysis of East Asia subgroup in Study 309/KEYNOTE-775: lenvatinib plus pembrolizumab versus treatment of physician’s choice chemotherapy in patients with previously treated advanced or recurrent endometrial cancer
Kan YONEMORI ; Keiichi FUJIWARA ; Kosei HASEGAWA ; Mayu YUNOKAWA ; Kimio USHIJIMA ; Shiro SUZUKI ; Ayumi SHIKAMA ; Shinichiro MINOBE ; Tomoka USAMI ; Jae-Weon KIM ; Byoung-Gie KIM ; Peng-Hui WANG ; Ting-Chang CHANG ; Keiko YAMAMOTO ; Shirong HAN ; Jodi MCKENZIE ; Robert J. ORLOWSKI ; Takuma MIURA ; Vicky MAKKER ; Yong Man KIM
Journal of Gynecologic Oncology 2024;35(2):e40-
Objective:
In the global phase 3 Study 309/KEYNOTE-775 (NCT03517449) at the first interim analysis, lenvatinib+pembrolizumab significantly improved progression-free survival (PFS), overall survival (OS), and objective response rate (ORR) versus treatment of physician’s choice chemotherapy (TPC) in patients with previously treated advanced/recurrent endometrial cancer (EC). This exploratory analysis evaluated outcomes in patients enrolled in East Asia at the time of prespecified final analysis.
Methods:
Women ≥18 years with histologically confirmed advanced, recurrent, or metastatic EC with progressive disease after 1 platinum-based chemotherapy (2 if 1 given in neoadjuvant/ adjuvant setting) were enrolled. Patients were randomized 1:1 to lenvatinib 20 mg orally once daily plus pembrolizumab 200 mg intravenously every 3 weeks (≤35 cycles) or TPC (doxorubicin or paclitaxel). Primary endpoints were PFS per RECIST v1.1 by blinded independent central review and OS. No alpha was assigned for this subgroup analysis.
Results:
Among 155 East Asian patients (lenvatinib+pembrolizumab, n=77; TPC, n=78), median follow-up time (data cutoff: March 1, 2022) was 34.3 (range, 25.1–43.0) months.Hazard ratios (HRs) with 95% confidence intervals (CIs) for PFS (lenvatinib+pembrolizumab vs. TPC) were 0.74 (0.49–1.10) and 0.64 (0.44–0.94) in the mismatch repair proficient (pMMR) and all-comer populations, respectively. HRs (95% CI) for OS were 0.68 (0.45–1.02) and 0.61 (0.41–0.90), respectively. ORRs were 36% with lenvatinib+pembrolizumab and 22% with TPC (pMMR) and 39% and 21%, respectively (all-comers). Treatment-related adverse events occurred in 97% and 96% (grade 3–5, 74% and 72%), respectively.
Conclusion
Lenvatinib+pembrolizumab provided clinically meaningful benefit with manageable safety compared with TPC, supporting its use in East Asian patients with previously treated advanced/recurrent EC.
9.Analyzing Large Language Models’ Responses to Common Lumbar Spine Fusion Surgery Questions: A Comparison Between ChatGPT and Bard
Siegmund Philipp LANG ; Ezra Tilahun YOSEPH ; Aneysis D. GONZALEZ-SUAREZ ; Robert KIM ; Parastou FATEMI ; Katherine WAGNER ; Nicolai MALDANER ; Martin N. STIENEN ; Corinna Clio ZYGOURAKIS
Neurospine 2024;21(2):633-641
Objective:
In the digital age, patients turn to online sources for lumbar spine fusion information, necessitating a careful study of large language models (LLMs) like chat generative pre-trained transformer (ChatGPT) for patient education.
Methods:
Our study aims to assess the response quality of Open AI (artificial intelligence)’s ChatGPT 3.5 and Google’s Bard to patient questions on lumbar spine fusion surgery. We identified 10 critical questions from 158 frequently asked ones via Google search, which were then presented to both chatbots. Five blinded spine surgeons rated the responses on a 4-point scale from ‘unsatisfactory’ to ‘excellent.’ The clarity and professionalism of the answers were also evaluated using a 5-point Likert scale.
Results:
In our evaluation of 10 questions across ChatGPT 3.5 and Bard, 97% of responses were rated as excellent or satisfactory. Specifically, ChatGPT had 62% excellent and 32% minimally clarifying responses, with only 6% needing moderate or substantial clarification. Bard’s responses were 66% excellent and 24% minimally clarifying, with 10% requiring more clarification. No significant difference was found in the overall rating distribution between the 2 models. Both struggled with 3 specific questions regarding surgical risks, success rates, and selection of surgical approaches (Q3, Q4, and Q5). Interrater reliability was low for both models (ChatGPT: k = 0.041, p = 0.622; Bard: k = -0.040, p = 0.601). While both scored well on understanding and empathy, Bard received marginally lower ratings in empathy and professionalism.
Conclusion
ChatGPT3.5 and Bard effectively answered lumbar spine fusion FAQs, but further training and research are needed to solidify LLMs’ role in medical education and healthcare communication.
10.Brain Derived Neurotrophic Factor Methylation and Long-term Outcomes after Stroke Interacting with Suicidal Ideation
Hee-Ju KANG ; Ju-Wan KIM ; Joon-Tae KIM ; Man-Seok PARK ; Byung Jo CHUN ; Sung-Wan KIM ; Il-Seon SHIN ; Robert STEWART ; Jae-Min KIM
Clinical Psychopharmacology and Neuroscience 2024;22(2):306-313
Objective:
This study aimed to evaluate the unexplored relationship between BDNF methylation, long-term outcomes, and its interaction with suicidal ideation (SI), which is closely associated with both BDNF expression and stroke outcomes.
Methods:
A total of 278 stroke patients were assessed for BDNF methylation status and SI using suicide-related item in the Montgomery–Åsberg Depression Rating Scale at 2 weeks post-stroke. We investigated the incidence of composite cerebro-cardiovascular events (CCVEs) during an 8−14-year period after the initial stroke as long-term stroke outcome.We conducted Cox regression models adjusted for covariates to evaluate the association between BDNF methylation status and CCVEs, as well as its interaction with post-stroke SI at 2 weeks.
Results:
Higher methylation status of CpG 1, 3, and 5, but not the average value, predicted a greater number of composite CCVEs during 8−14 years following the stroke. The associations between a higher methylation status of CpGs 1, 3, 5, and 8, as well as the average BDNF methylation value, and a greater number of composite CCVEs, were prominent in patients who had post-stroke SI at 2 weeks. Notably, a significant interaction between methylation status and SI on composite CCVEs was observed only for CpG 8.
Conclusion
The significant association between BDNF methylation and poor long-term stroke outcomes, particularly amplified in individuals who had post-stroke SI at 2 weeks, suggested that evaluating the biological marker status of BDNF methylation along with assessing SI during the acute phase of stroke can help predict long-term outcomes.

Result Analysis
Print
Save
E-mail