Automated classification of ICD-O-3 morphology code from pathology reports using text-mining and support vector machine
10.19485/j.cnki.issn2096-5087.2021.03.009
- Author:
PAN Jin
;
GONG Wei Wei
;
FEI Fang Rong
;
WANG Meng
;
ZHOU Xiao Yan
;
HU Ru Ying
;
ZHONG Jie Ming
- Publication Type:Journal Article
- Keywords:
neoplasm pathology text-mining support vector machine automated classification
- From:
Journal of Preventive Medicine
2021;33(3):255-258
- CountryChina
- Language:Chinese
-
Abstract:
Objective:To evaluate the accuracy of automated classification of ICD-O-3 morphology code from pathology reports by text-mining and support vector machine ( SVM ) , in order to provide basis for automated tumor coding in Chinese.
Methods:The tumor report cards of Zhejiang residents from 2017 to 2019 were collected from Chronic Disease Surveillance Information Management System of Zhejiang Province. According to ICD-O-3, the keywords of the pathology reports were extracted, and SVM was used for automatic classification. The classification results were compared with those of 16 professionals with more than two years of experience in tumor coding, and the accuracy rate, recall rate and F-score were calculated for effect evaluation.
Results:Totally 83 082 cases from 2017 to 2019 were included and were categorized into 17 morphological classifications, with 52 877 ( 63.65% ) cases of adenocarcinoma, squamous carcinoma and transitional cell carcinoma. A total of 1 090 keywords were enrolled into main corpus. The total F-score, accuracy rate and recall rate are 85.69, 77.20% and 96.27%, respectively.
Conclusion:Text-mining combined with SVM can improve the efficiency of ICD-O-3 morphology coding; however, the accuracy needs to be further improved.
- Full text:文本分析联合支持向量机的肿瘤ICD-O-3病理形态学自动分类效果评价.pdf