Application of natural language processing models in cervical cancer staging and risk factors extraction

Xiang CHEN; Shengyun FAN; Yihang ZHANG; Yi XU; Shengyu YAO; Ge YAN

Return

Application of natural language processing models in cervical cancer staging and risk factors extraction

VernacularTitle:自然语言处理模型在宫颈癌分期和风险因素提取中的应用
Author: Xiang CHEN ¹ ; Shengyun FAN ; Yihang ZHANG ; Yi XU ; Shengyu YAO ; Ge YAN
Author Information

1. 上海市第一人民医院核医学科,上海交通大学医学院,上海 201620
Publication Type:Journal Article
Keywords: Cervical cancer; Natural language processing; FIGO stage; Risk factors
From: Tumor 2025;45(3):287-296
CountryChina
Language:Chinese
Abstract: Objective:To evaluate the accuracy and output stability of online general natural language processing(NLP)models for staging diagnosis and identifying medium-and high-risk factors in cervical cancer patients based on pathology reports.Methods:Surgical pathological reports of 65 patients with cervical cancer who received postoperative adjuvant radiotherapy at Shanghai General Hospital from January 2022 to December 2023 were retrospectively selected.These reports were input into two online NLP models,Kimi and ChatGPT-4o,and their output staging diagnosis results were recorded.Then,the results were classified into 3 categories and scored as follows:correct(2 points),basically correct(1 point)and incorrect(0 point).Each pathologic report was tested 5 times to assess the stability of the outputs form Kimi and ChatGPT-4o,and the consistency between the NLP models and clinical physicians in cervical cancer staging was compared.Prompt-based questioning was used to evaluate Kimi's ability to extract medium-and high-risk factors from the pathology reports of cervical cancer patients.Results:There was no statistical significant difference in the staging diagnosis results between the two NLP models and the clinical physicians(x2=5.740,P=0.057).Kimi and ChatGPT-4o respectively produced 56 and 47 correct results,6 and 15 basically correct results,and 3 and 3 incorrect results.Their mean scores were 7.08±2.70 and 7.97±2.97,and the difference between them was statistically significant(P=0.040).In the extraction of risk factors from 65 cervical cancer patients,involving a total of 390 factors,Kimi made only three false-positive errors,with all other factors correctly identified.Conclusion:Online general NLP models can stably output the stage of cervical cancer patients with diagnostic accuracy comparable to clinical physicians.With the assistance of prompt-based questioning,these NLP models can accurately extract medium-and high-risk factors from pathology reports of cervical cancer patients,demonstrating promising clinical application potential.