1.Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation
Cailing HAN ; Shizhu BAI ; Tingmin ZHANG ; Chen LIU ; Yuchen LIU ; Xiangxiang HU ; Yimin ZHAO
Chinese Journal of Stomatology 2025;60(8):871-878
Objective:To evaluate the accuracy of the oral healthcare information provided by different large language models (LLM) to explore their feasibility and limitations in the application of oral auxiliary, treatment and health consultation.Methods:This study designed eight items comprising 47 questions in total related to the diagnosis and treatment of oral diseases [to assess the performance of LLM as an artificial intelligence (AI) medical assistant], and five items comprising 35 questions in total about oral health consultations (to assess the performance of LLM as a simulated doctor). These questions were answered individually by the five LLM models (Erine Bot, HuatuoGPT, Tongyi Qianwen, iFlytek Spark, ChatGPT). Two attending physicians with more than 5 years of experience independently rated the responses using the 3C criteria (correct, clear, concise), and the consistency between the raters was assessed using the Spearman rank correlation coefficient, and the Kruskal-Wallis test and Dunn post hoc test were used to assess the statistical differences between the models. Additionally, this study used 600 questions from the 2023 dental licensing examination to evaluate the time taken to answer, scores, and accuracy of each model.Results:As an AI medical assistant, LLM can assist doctors in diagnosis and treatment decision-making, with an inter-evaluator Spearman coefficient of 0.505 ( P<0.01). As a simulated doctor, LLM can carry out patient popularization, with an inter-evaluator Spearman coefficient of 0.533 ( P<0.01). The 3C scores of each model as an AI medical assistant and a simulated doctor were respectively: 2.00 (1.00, 3.00) and 2.00 (2.00, 3.00) points of Erine Bot, 1.00 (1.00, 2.00) and 2.00 (1.00, 2.00) points of HuatuoGPT, 2.00 (1.00, 2.00) and 2.00 (1.00, 3.00) points of Tongyi Qianwen, 2.00 (1.00, 2.00) and 2.00 (1.75, 2.25) points of iFlytek Spark, 3.00 (2.00, 3.00) and 3.00 (2.00, 3.00) points of ChatGPT (full score of 4 points). The Kruskal-Wallis test results showed that, as an AI medical assistant or a simulated doctor, there were statistically differences in the 3C scores among the five large language models (all P<0.001). The average score of the 5 LLMs on the dental licensing examination was 370.2, with an accuracy rate of 61.7% (370.2/600) and a time consumption of 94.6 min. Specifically, Erine Bot took 115 min, scored 363 points with an accuracy rate of 60.5% (363/600), HuatuoGPT took 224 min and scored 305 points with an accuracy rate of 50.8% (305/600), Tongyi Qianwen took 43 min, scored 438 points with an accuracy rate of 73.0% (438/600), iFlytek Spark took 32 min, scored 364 points with an accuracy rate of 60.7% (364/600), and ChatGPT took 59 min, scored 381 points with an accuracy rate of 63.5% (381/600). Conclusions:Based on the evaluation of LLM′s dual roles as an AI medical assistant and a simulated doctor, ChatGPT performes the best, with basically correct, clear and concise answers, followed by Erine Bot, Tongyi Qianwen and iFlytek Spark, with HuatuoGPT lagging behind significantly. In the dental licensing examination, all the 4 LLM, except for HuatuoGPT, reach the passing level, and the time consumpution for answering is significantly reduced compared to the 8 h required for the exam regulations in all of the five models. LLM has the feasibility of application in oral auxiliary, treatment and health consultation, and it can help both doctors and patients obtain medical information quickly. Howere, their outputs carry a risk of errors (since the 3C scoring results do not reach the full marks), so prudent judgment should be exercised when using them.
2.Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation
Cailing HAN ; Shizhu BAI ; Tingmin ZHANG ; Chen LIU ; Yuchen LIU ; Xiangxiang HU ; Yimin ZHAO
Chinese Journal of Stomatology 2025;60(8):871-878
Objective:To evaluate the accuracy of the oral healthcare information provided by different large language models (LLM) to explore their feasibility and limitations in the application of oral auxiliary, treatment and health consultation.Methods:This study designed eight items comprising 47 questions in total related to the diagnosis and treatment of oral diseases [to assess the performance of LLM as an artificial intelligence (AI) medical assistant], and five items comprising 35 questions in total about oral health consultations (to assess the performance of LLM as a simulated doctor). These questions were answered individually by the five LLM models (Erine Bot, HuatuoGPT, Tongyi Qianwen, iFlytek Spark, ChatGPT). Two attending physicians with more than 5 years of experience independently rated the responses using the 3C criteria (correct, clear, concise), and the consistency between the raters was assessed using the Spearman rank correlation coefficient, and the Kruskal-Wallis test and Dunn post hoc test were used to assess the statistical differences between the models. Additionally, this study used 600 questions from the 2023 dental licensing examination to evaluate the time taken to answer, scores, and accuracy of each model.Results:As an AI medical assistant, LLM can assist doctors in diagnosis and treatment decision-making, with an inter-evaluator Spearman coefficient of 0.505 ( P<0.01). As a simulated doctor, LLM can carry out patient popularization, with an inter-evaluator Spearman coefficient of 0.533 ( P<0.01). The 3C scores of each model as an AI medical assistant and a simulated doctor were respectively: 2.00 (1.00, 3.00) and 2.00 (2.00, 3.00) points of Erine Bot, 1.00 (1.00, 2.00) and 2.00 (1.00, 2.00) points of HuatuoGPT, 2.00 (1.00, 2.00) and 2.00 (1.00, 3.00) points of Tongyi Qianwen, 2.00 (1.00, 2.00) and 2.00 (1.75, 2.25) points of iFlytek Spark, 3.00 (2.00, 3.00) and 3.00 (2.00, 3.00) points of ChatGPT (full score of 4 points). The Kruskal-Wallis test results showed that, as an AI medical assistant or a simulated doctor, there were statistically differences in the 3C scores among the five large language models (all P<0.001). The average score of the 5 LLMs on the dental licensing examination was 370.2, with an accuracy rate of 61.7% (370.2/600) and a time consumption of 94.6 min. Specifically, Erine Bot took 115 min, scored 363 points with an accuracy rate of 60.5% (363/600), HuatuoGPT took 224 min and scored 305 points with an accuracy rate of 50.8% (305/600), Tongyi Qianwen took 43 min, scored 438 points with an accuracy rate of 73.0% (438/600), iFlytek Spark took 32 min, scored 364 points with an accuracy rate of 60.7% (364/600), and ChatGPT took 59 min, scored 381 points with an accuracy rate of 63.5% (381/600). Conclusions:Based on the evaluation of LLM′s dual roles as an AI medical assistant and a simulated doctor, ChatGPT performes the best, with basically correct, clear and concise answers, followed by Erine Bot, Tongyi Qianwen and iFlytek Spark, with HuatuoGPT lagging behind significantly. In the dental licensing examination, all the 4 LLM, except for HuatuoGPT, reach the passing level, and the time consumpution for answering is significantly reduced compared to the 8 h required for the exam regulations in all of the five models. LLM has the feasibility of application in oral auxiliary, treatment and health consultation, and it can help both doctors and patients obtain medical information quickly. Howere, their outputs carry a risk of errors (since the 3C scoring results do not reach the full marks), so prudent judgment should be exercised when using them.
3.Epidemiological characteristics of five cases of importing yellow fever in Fujian province and strategies for prevention and control of infection in hospital
Lifen HAN ; Zhiping ZHAO ; Xiaoling YU ; Zhongqiong QIU ; Cailing HE ; Shengcan GUAN ; Shouyun XIE ; Yuhai WANG ; Lu LIU ; Hanhui YE ; Chen PAN ; Qin LI
Chinese Journal of Infectious Diseases 2016;34(11):665-669
Objective To analyze the epidemiological and clinical characteristics of 5 patients with importing yellow fever ,and to explore the preventive and control strategies of infection in hospital .Methods The epidemiological and clinical characteristics of 5 cases of importing yellow fever in Infectious Disease Hospital of Fujian Medical University from March 18th to April 6th in 2016 were retrospectively reviewed and analyzed .Results Five patients were all from Angola Luanda .One of them was vaccinated before going aboard ,and the others were vaccinated 1—10 days before disease onset in Angola .All of them were bitten by mosquitoes ,and their onset date ranged from March 11th to March 27th ,before returned to Fujian .The main clinical symptoms were fever ,chilly ,shivering ,fatigue ,arthrodynia ,headache ,and liver and kidney injury .At manifestations ,two patients had positive nuclear acid of yellow fever virus in serum samples and 3 patients were positive in urine samples .All of these patients were negative for dengue virus and Zika virus testing ,meanwhile no plasmodium was found in blood smears .All patients were cured and discharged . Conclusions There is risk of yellow fever transmission in Fujian Province . Prevention and control of the disease should be focus on improving the ability of finding and coping with the importing cases .Vaccination and hygiene knowledge propagation should be given for those who are going to epidemic country/area .Emergency monitoring and control of mosquitoes are necessary .

Result Analysis
Print
Save
E-mail