Guideline-driven clinical decision support for colonoscopy patients using the hierarchical multi-label deep learning method.
10.1097/CM9.0000000000003469
- Author:
Junling WU
1
;
Jun CHEN
1
;
Hanwen ZHANG
1
;
Zhe LUAN
1
;
Yiming ZHAO
2
;
Mengxuan SUN
3
;
Shufang WANG
4
;
Congyong LI
5
;
Zhizhuang ZHAO
6
;
Wei ZHANG
1
;
Yi CHEN
1
;
Jiaqi ZHANG
1
;
Yansheng LI
7
;
Kejia LIU
7
;
Jinghao NIU
8
;
Gang SUN
4
Author Information
1. Medical School of Chinese PLA, Beijing 100853, China.
2. Department of Gastroenterology and Hepatology, Hainan Hospital of PLA General Hospital, Sanya, Hainan 572013, China.
3. University of Chinese Academy of Sciences, Beijing 101408, China.
4. Department of Gastroenterology and Hepatology, the First Medical Center, Chinese PLA General Hospital, Beijing 100853, China.
5. Sixth Health Care Department, Second Medical Center of PLA General Hospital, Beijing 100853, China.
6. Department of Geriatrics, Hainan Hospital of PLA General Hospital, Sanya, Hainan 572013, China.
7. DHC Mediway Technology Co., Ltd, Beijing 100089, China.
8. Institute of Automation Chinese Academy of Sciences, Beijing 100190, China.
- Publication Type:Journal Article
- Keywords:
Clinical decision support;
Colonoscopy;
Deep learning;
Hierarchical multi-label
- MeSH:
Humans;
Colonoscopy/methods*;
Deep Learning;
Decision Support Systems, Clinical;
Female;
Male
- From:
Chinese Medical Journal
2025;138(20):2631-2639
- CountryChina
- Language:English
-
Abstract:
BACKGROUND:Over 20 million colonoscopies are performed in China annually. An automatic clinical decision support system (CDSS) with accurate semantic recognition of colonoscopy reports and guideline-based is helpful to relieve the increasing medical burden and standardize the healthcare. In this study, the CDSS was built under a hierarchical-label interpretable classification framework, trained by a state-of-the-art transformer-based model, and validated in a multi-center style.
METHODS:We conducted stratified sampling on a previously established dataset containing 302,965 electronic colonoscopy reports with pathology, identified 2041 patients' records representative of overall features, and randomly divided into the training and testing sets (7:3). A total of five main labels and 22 sublabels were applied to annotate each record on a network platform, and the data were trained respectively by three pre-training models on Chinese corpus website, including bidirectional encoder representations from transformers (BERT)-base-Chinese (BC), the BERT-wwm-ext-Chinese (BWEC), and ernie-3.0-base-zh (E3BZ). The performance of trained models was subsequently compared with a randomly initialized model, and the preferred model was selected. Model fine-tuning was applied to further enhance the capacity. The system was validated in five other hospitals with 3177 consecutive colonoscopy cases.
RESULTS:The E3BZ pre-trained model exhibited the best performance, with a 90.18% accuracy and a 69.14% Macro-F1 score overall. The model achieved 100% accuracy in identifying cancer cases and 99.16% for normal cases. In external validation, the model exhibited favorable consistency and good performance among five hospitals.
CONCLUSIONS:The novel CDSS possesses high-level semantic recognition of colonoscopy reports, provides appropriate recommendations, and holds the potential to be a powerful tool for physicians and patients. The hierarchical multi-label strategy and pre-training method should be amendable to manage more medical text in the future.