Cleaning the surveillance data of Keshan disease in 2009, China
10.3760/cma.j.issn.2095-4255.2014.04.024
- VernacularTitle:2009年全国克山病监测数据清洗结果
- Author:
Zhongming SUN
;
Jie HOU
;
Tong WANG
;
Bainan XU
;
Lili ZHAO
;
Shie LI
;
Chao YE
;
Yan WANG
;
Hongyang PANG
- Publication Type:Journal Article
- Keywords:
Data cleaning;
Keshan disease;
Outcom assessment
- From:
Chinese Journal of Endemiology
2014;(4):442-445
- CountryChina
- Language:Chinese
-
Abstract:
Objective To investigate potential problems and solutions within the data of national surveillance of Keshan disease(KSD), to improve the quality of surveillance data and the reliability of the results. Methods Four key variables (name, sex, age, and KSD diagnosis) in the national surveillance data of KSD in 2009 were cleaned by SPSS 15.0. Cleaning contents included duplicate records, missing values, outliers and logic errors. Name, sex, age, currently residing in townships and currently residing in villages and other variables were combined into different filters to find duplicate records by the command of Identify Duplicate Cases , then the duplicate records were returned to the data reporting agencies, and finally delete or merge. Data with missing values, outliers, or logical errors were found by commands of Frequencies, Descriptives and Select if, then the duplicate records were returned to the data reporting agencies. Data were revised based on not only the feedback , but also by using the relationship between variables, and by consulting KSD clinical experts. Results Four hundred and sixty-four cases of duplicate records were found and cleaned. The number of missing values was 2 047 (specifically, name 0, sex 3, age 32 and KSD diagnosis 2 012). The number of outliers was 1 988 (specifically, name 6, sex 3, age 10 and KSD diagnosis 1 969). The records of 5 kinds of logic errors of KSD diagnosis were 105 in all. Conclusion There are duplicate records, missing values, outliers and logic errors in the national surveillance data of KSD, cleaning work could improve the quality of surveillance data, ensure the authenticity and rliability of the monitoring data.