1.An Integrated Guide for Designing Video Abstracts Using Freeware and Their Emerging Role in Academic Research Advancement
Ria GUPTA ; Mrudula JOSHI ; Latika GUPTA
Journal of Korean Medical Science 2021;36(9):e66-
Video abstracts (VAs) are a motion picture equivalent of written abstracts. With greater use of social media platforms for post publication promotions of research articles, VAs have gained increasing popularity among researchers in recent years. Widespread lockdowns and social distancing protocols in the pandemic period furthered the use of VAs as a tool for efficient learning. Moreover, these may be the preferred medium for communicating certain types of information, such as diagnostic or therapeutic procedures, qualitative research, perspectives, and techniques. In this article, the authors discuss the role of VAs in the advancement of academic research, plausible designs, freeware for making videos, and specific considerations for crafting good VAs.
2.An Integrated Guide for Designing Video Abstracts Using Freeware and Their Emerging Role in Academic Research Advancement
Ria GUPTA ; Mrudula JOSHI ; Latika GUPTA
Journal of Korean Medical Science 2021;36(9):e66-
Video abstracts (VAs) are a motion picture equivalent of written abstracts. With greater use of social media platforms for post publication promotions of research articles, VAs have gained increasing popularity among researchers in recent years. Widespread lockdowns and social distancing protocols in the pandemic period furthered the use of VAs as a tool for efficient learning. Moreover, these may be the preferred medium for communicating certain types of information, such as diagnostic or therapeutic procedures, qualitative research, perspectives, and techniques. In this article, the authors discuss the role of VAs in the advancement of academic research, plausible designs, freeware for making videos, and specific considerations for crafting good VAs.
3.Accuracy, appropriateness, and readability of ChatGPT-4 and ChatGPT-3.5 in answering pediatric emergency medicine post-discharge questions
Mitul GUPTA ; Aiza KAHLUN ; Ria SUR ; Pramiti GUPTA ; Andrew KIENSTRA ; Winnie WHITAKER ; Graham AUFRICHT
Pediatric Emergency Medicine Journal 2025;12(2):62-72
Purpose:
Large language models (LLMs) like ChatGPT (OpenAI) are increasingly used in healthcare, raising questions about their accuracy and reliability for medical information. This study compared 2 versions of ChatGPT in answering post-discharge follow-up questions in the area of pediatric emergency medicine (PEM).
Methods:
Twenty-three common post-discharge questions were posed to ChatGPT-4 and -3.5, with responses generated before and after a simplification request. Two blinded PEM physicians evaluated appropriateness and accuracy as the primary endpoint. Secondary endpoints included word count and readability. Six established reading scales were averaged, including the Automated Readability Index, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, Simple Measure of Gobbledygook Grade Level, and Flesch Reading Ease. T-tests and Cohen’s kappa were used to determine differences and inter-rater agreement, respectively.
Results:
The physician evaluations showed high appropriateness for both defaults (ChatGPT-4, 91.3%-100% vs. ChatGPT-3.5, 91.3%) and simplified responses (both 87.0%-91.3%). The accuracy was also high for default (87.0%-95.7% vs. 87.0%-91.3%) and simplified responses (both 82.6%-91.3%). The inter-rater agreement was fair overall (κ = 0.37; P < 0.001). For default responses, ChatGPT-4 produced longer outputs than ChatGPT-3.5 (233.0 ± 97.1 vs. 199.6 ± 94.7 words; P = 0.043), with a similar readability (13.3 ± 1.9 vs. 13.5 ± 1.8; P = 0.404). After simplification, both LLMs improved word count and readability (P < 0.001), with ChatGPT-4 achieving a readability suitable for the eighth grade students in the United States (7.7 ± 1.3 vs. 8.2 ± 1.5; P = 0.027).
Conclusion
The responses of ChatGPT-4 and -3.5 to post-discharge questions were deemed appropriate and accurate by the PEM physicians. While ChatGPT-4 showed an edge in simplifying language, neither LLM consistently met the recommended reading level of sixth grade students. These findings suggest a potential for LLMs to communicate with guardians.
4.Accuracy, appropriateness, and readability of ChatGPT-4 and ChatGPT-3.5 in answering pediatric emergency medicine post-discharge questions
Mitul GUPTA ; Aiza KAHLUN ; Ria SUR ; Pramiti GUPTA ; Andrew KIENSTRA ; Winnie WHITAKER ; Graham AUFRICHT
Pediatric Emergency Medicine Journal 2025;12(2):62-72
Purpose:
Large language models (LLMs) like ChatGPT (OpenAI) are increasingly used in healthcare, raising questions about their accuracy and reliability for medical information. This study compared 2 versions of ChatGPT in answering post-discharge follow-up questions in the area of pediatric emergency medicine (PEM).
Methods:
Twenty-three common post-discharge questions were posed to ChatGPT-4 and -3.5, with responses generated before and after a simplification request. Two blinded PEM physicians evaluated appropriateness and accuracy as the primary endpoint. Secondary endpoints included word count and readability. Six established reading scales were averaged, including the Automated Readability Index, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, Simple Measure of Gobbledygook Grade Level, and Flesch Reading Ease. T-tests and Cohen’s kappa were used to determine differences and inter-rater agreement, respectively.
Results:
The physician evaluations showed high appropriateness for both defaults (ChatGPT-4, 91.3%-100% vs. ChatGPT-3.5, 91.3%) and simplified responses (both 87.0%-91.3%). The accuracy was also high for default (87.0%-95.7% vs. 87.0%-91.3%) and simplified responses (both 82.6%-91.3%). The inter-rater agreement was fair overall (κ = 0.37; P < 0.001). For default responses, ChatGPT-4 produced longer outputs than ChatGPT-3.5 (233.0 ± 97.1 vs. 199.6 ± 94.7 words; P = 0.043), with a similar readability (13.3 ± 1.9 vs. 13.5 ± 1.8; P = 0.404). After simplification, both LLMs improved word count and readability (P < 0.001), with ChatGPT-4 achieving a readability suitable for the eighth grade students in the United States (7.7 ± 1.3 vs. 8.2 ± 1.5; P = 0.027).
Conclusion
The responses of ChatGPT-4 and -3.5 to post-discharge questions were deemed appropriate and accurate by the PEM physicians. While ChatGPT-4 showed an edge in simplifying language, neither LLM consistently met the recommended reading level of sixth grade students. These findings suggest a potential for LLMs to communicate with guardians.
5.Accuracy, appropriateness, and readability of ChatGPT-4 and ChatGPT-3.5 in answering pediatric emergency medicine post-discharge questions
Mitul GUPTA ; Aiza KAHLUN ; Ria SUR ; Pramiti GUPTA ; Andrew KIENSTRA ; Winnie WHITAKER ; Graham AUFRICHT
Pediatric Emergency Medicine Journal 2025;12(2):62-72
Purpose:
Large language models (LLMs) like ChatGPT (OpenAI) are increasingly used in healthcare, raising questions about their accuracy and reliability for medical information. This study compared 2 versions of ChatGPT in answering post-discharge follow-up questions in the area of pediatric emergency medicine (PEM).
Methods:
Twenty-three common post-discharge questions were posed to ChatGPT-4 and -3.5, with responses generated before and after a simplification request. Two blinded PEM physicians evaluated appropriateness and accuracy as the primary endpoint. Secondary endpoints included word count and readability. Six established reading scales were averaged, including the Automated Readability Index, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, Simple Measure of Gobbledygook Grade Level, and Flesch Reading Ease. T-tests and Cohen’s kappa were used to determine differences and inter-rater agreement, respectively.
Results:
The physician evaluations showed high appropriateness for both defaults (ChatGPT-4, 91.3%-100% vs. ChatGPT-3.5, 91.3%) and simplified responses (both 87.0%-91.3%). The accuracy was also high for default (87.0%-95.7% vs. 87.0%-91.3%) and simplified responses (both 82.6%-91.3%). The inter-rater agreement was fair overall (κ = 0.37; P < 0.001). For default responses, ChatGPT-4 produced longer outputs than ChatGPT-3.5 (233.0 ± 97.1 vs. 199.6 ± 94.7 words; P = 0.043), with a similar readability (13.3 ± 1.9 vs. 13.5 ± 1.8; P = 0.404). After simplification, both LLMs improved word count and readability (P < 0.001), with ChatGPT-4 achieving a readability suitable for the eighth grade students in the United States (7.7 ± 1.3 vs. 8.2 ± 1.5; P = 0.027).
Conclusion
The responses of ChatGPT-4 and -3.5 to post-discharge questions were deemed appropriate and accurate by the PEM physicians. While ChatGPT-4 showed an edge in simplifying language, neither LLM consistently met the recommended reading level of sixth grade students. These findings suggest a potential for LLMs to communicate with guardians.
6.Accuracy, appropriateness, and readability of ChatGPT-4 and ChatGPT-3.5 in answering pediatric emergency medicine post-discharge questions
Mitul GUPTA ; Aiza KAHLUN ; Ria SUR ; Pramiti GUPTA ; Andrew KIENSTRA ; Winnie WHITAKER ; Graham AUFRICHT
Pediatric Emergency Medicine Journal 2025;12(2):62-72
Purpose:
Large language models (LLMs) like ChatGPT (OpenAI) are increasingly used in healthcare, raising questions about their accuracy and reliability for medical information. This study compared 2 versions of ChatGPT in answering post-discharge follow-up questions in the area of pediatric emergency medicine (PEM).
Methods:
Twenty-three common post-discharge questions were posed to ChatGPT-4 and -3.5, with responses generated before and after a simplification request. Two blinded PEM physicians evaluated appropriateness and accuracy as the primary endpoint. Secondary endpoints included word count and readability. Six established reading scales were averaged, including the Automated Readability Index, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, Simple Measure of Gobbledygook Grade Level, and Flesch Reading Ease. T-tests and Cohen’s kappa were used to determine differences and inter-rater agreement, respectively.
Results:
The physician evaluations showed high appropriateness for both defaults (ChatGPT-4, 91.3%-100% vs. ChatGPT-3.5, 91.3%) and simplified responses (both 87.0%-91.3%). The accuracy was also high for default (87.0%-95.7% vs. 87.0%-91.3%) and simplified responses (both 82.6%-91.3%). The inter-rater agreement was fair overall (κ = 0.37; P < 0.001). For default responses, ChatGPT-4 produced longer outputs than ChatGPT-3.5 (233.0 ± 97.1 vs. 199.6 ± 94.7 words; P = 0.043), with a similar readability (13.3 ± 1.9 vs. 13.5 ± 1.8; P = 0.404). After simplification, both LLMs improved word count and readability (P < 0.001), with ChatGPT-4 achieving a readability suitable for the eighth grade students in the United States (7.7 ± 1.3 vs. 8.2 ± 1.5; P = 0.027).
Conclusion
The responses of ChatGPT-4 and -3.5 to post-discharge questions were deemed appropriate and accurate by the PEM physicians. While ChatGPT-4 showed an edge in simplifying language, neither LLM consistently met the recommended reading level of sixth grade students. These findings suggest a potential for LLMs to communicate with guardians.
7.Accuracy, appropriateness, and readability of ChatGPT-4 and ChatGPT-3.5 in answering pediatric emergency medicine post-discharge questions
Mitul GUPTA ; Aiza KAHLUN ; Ria SUR ; Pramiti GUPTA ; Andrew KIENSTRA ; Winnie WHITAKER ; Graham AUFRICHT
Pediatric Emergency Medicine Journal 2025;12(2):62-72
Purpose:
Large language models (LLMs) like ChatGPT (OpenAI) are increasingly used in healthcare, raising questions about their accuracy and reliability for medical information. This study compared 2 versions of ChatGPT in answering post-discharge follow-up questions in the area of pediatric emergency medicine (PEM).
Methods:
Twenty-three common post-discharge questions were posed to ChatGPT-4 and -3.5, with responses generated before and after a simplification request. Two blinded PEM physicians evaluated appropriateness and accuracy as the primary endpoint. Secondary endpoints included word count and readability. Six established reading scales were averaged, including the Automated Readability Index, Gunning Fog Index, Flesch-Kincaid Grade Level, Coleman-Liau Index, Simple Measure of Gobbledygook Grade Level, and Flesch Reading Ease. T-tests and Cohen’s kappa were used to determine differences and inter-rater agreement, respectively.
Results:
The physician evaluations showed high appropriateness for both defaults (ChatGPT-4, 91.3%-100% vs. ChatGPT-3.5, 91.3%) and simplified responses (both 87.0%-91.3%). The accuracy was also high for default (87.0%-95.7% vs. 87.0%-91.3%) and simplified responses (both 82.6%-91.3%). The inter-rater agreement was fair overall (κ = 0.37; P < 0.001). For default responses, ChatGPT-4 produced longer outputs than ChatGPT-3.5 (233.0 ± 97.1 vs. 199.6 ± 94.7 words; P = 0.043), with a similar readability (13.3 ± 1.9 vs. 13.5 ± 1.8; P = 0.404). After simplification, both LLMs improved word count and readability (P < 0.001), with ChatGPT-4 achieving a readability suitable for the eighth grade students in the United States (7.7 ± 1.3 vs. 8.2 ± 1.5; P = 0.027).
Conclusion
The responses of ChatGPT-4 and -3.5 to post-discharge questions were deemed appropriate and accurate by the PEM physicians. While ChatGPT-4 showed an edge in simplifying language, neither LLM consistently met the recommended reading level of sixth grade students. These findings suggest a potential for LLMs to communicate with guardians.