The trustworthiness of large language models in the application of anterior teeth aesthetic restoration

Guohui ZHU; Chunxia CHEN

Return

The trustworthiness of large language models in the application of anterior teeth aesthetic restoration

VernacularTitle:大语言模型应用于前牙美学修复中的可信性研究
Author: Guohui ZHU ¹ ; Chunxia CHEN ¹
Author Information

1. 300041,天津市口腔医院口腔修复一科,南开大学医学院,天津市口腔功能重建重点实验室
Publication Type:Journal Article
Keywords: Large language models; Anterior tooth aesthetic restoration; Recall rate; Hallucination rate; Chain of thoughts; Retrieval-augmented generation
From: Journal of Practical Stomatology 2025;41(1):88-92
CountryChina
Language:Chinese
Abstract: Objective:To evaluate the trustworthiness of generative artificial intelligence technology of Chinese large language mod-els(LLMs)in addressing issues related to anterior tooth aesthetic restoration and to explore the way to enhance the reliability of existing LLMs when answering questions in the field of oral health care through relevant artificial intelligence technologies.Methods:4 top-tier Chinese LLMs,BaiChuan 3.0(A),ZhiPu QingYan GLM-4(B),Wenxin Yiyuan 3.5(C),and QianWen(D)were used to analyze 10 items of anterior teeth aesthetic restoration.Standards were set using scholarly data and expert consensus,the model's recall and hallucination rates were compared.CoT technique was applied to gauge the effect on enhancing answer accuracy in dental queries.A and B models were tested for the effect of retrieval-augmented generation(RAG)in the improvement of their performance.Results:The recal rate of model A,B,C and D was 0.416 7±0.13,0.350 5±0.20,0.358 7±0.01 and 0.561 9±0.04 respective-ly,the hallucination rate was 0.465 1±0.04,0.694 6±0.13,0.501 8±0.08 and 0.311 9±0.09 respectively(between A and D groups,t≈15.53,P＜0.05).After integrating Chain-of-Thought(CoT),overall recall improved but some models'hallucination rates rose.Applying RAG features in A and B significantly enhanced answer recall and reduced hallucination rates(P＜0.05).Con-clusion:The methods or features employed by the QianWen LLM demonstrated significant advantages in enhancing answer accuracy and reducing misinformation,thus showing higher credibility in addressing anterior aesthetic restoration issues.Application of the CoT technique may boost correct response rates in some models and increase hallucination rates.In contrast,the RAG strategy can improve the correctness of the LLMs and decreased spurious outputs.