Identification and analysis method of duplicate reports of brucellosis based on machine learning in China
10.3969/j.issn.1006-2483.2022.05.006
- VernacularTitle:基于机器学习的全国布鲁氏菌病重复报告分析方法研究
- Author:
Shuai-bing DONG
1
;
Li-ping WANG
2
;
Ye-wu ZHANG
3
;
Yan-fei LI
3
Author Information
1. Institute for Infectious Disease and Endemic Disease Control, Beijing Center for Disease Prevention and Control, Beijing Research Center for Preventive Medicine, Beijing 100013, China
2. Division of Infectious Disease Control and Prevention, Key Laboratory for Surveillance and Early Warning of Infectious Disease, Chinese Center for Disease Control and Prevention, Beijing 102206, China
3. Public Health Surveillance and Information Service Center, Chinese Center for Disease Control and Prevention, Beijing 102206, China
- Publication Type:Journal Article
- Keywords:
Brucellosis;
Notifiable Disease Report System;
Machine learning;
Data quality;
Duplicate report
- From:
Journal of Public Health and Preventive Medicine
2022;33(5):29-31
- CountryChina
- Language:Chinese
-
Abstract:
Objective To study the identification of brucellosis duplicate cards by machine learning. Methods Using the 499 577 brucellosis case cards reported in the National Notifiable Disease Report System from 2005 to 2017, referring to the manual identification of 3 785 duplicate cards, a data set and related features were established for machine learning. KNN (K Nearest Neighbor), support vector machine (SVC), and random forest models were selected for training, and the resulting models were classified and predicted. Results The AUC (Area Under Curve) values of KNN, SVM and random forest models were 0.97, 0.97 and 0.98, respectively. Conclusions Three models of KNN, SVM and random forest all display good recognition effects, among which, the random forest model has the best identification effect, followed by the SVM. Method of machine learning can well identify brucellosis accumulated duplicate cards, which has certain practical value for data analysis and data report management of infectious disease report.