Abstract
Objective To investigate potential problems and solutions within the data of national surveillance of Keshan disease(KSD), to improve the quality of surveillance data and the reliability of the results. Methods Four key variables (name, sex, age, and KSD diagnosis) in the national surveillance data of KSD in 2009 were cleaned by SPSS 15.0. Cleaning contents included duplicate records, missing values, outliers and logic errors. Name, sex, age, currently residing in townships and currently residing in villages and other variables were combined into different filters to find duplicate records by the command of Identify Duplicate Cases , then the duplicate records were returned to the data reporting agencies, and finally delete or merge. Data with missing values, outliers, or logical errors were found by commands of Frequencies, Descriptives and Select if, then the duplicate records were returned to the data reporting agencies. Data were revised based on not only the feedback , but also by using the relationship between variables, and by consulting KSD clinical experts. Results Four hundred and sixty-four cases of duplicate records were found and cleaned. The number of missing values was 2 047 (specifically, name 0, sex 3, age 32 and KSD diagnosis 2 012). The number of outliers was 1 988 (specifically, name 6, sex 3, age 10 and KSD diagnosis 1 969). The records of 5 kinds of logic errors of KSD diagnosis were 105 in all. Conclusion There are duplicate records, missing values, outliers and logic errors in the national surveillance data of KSD, cleaning work could improve the quality of surveillance data, ensure the authenticity and rliability of the monitoring data. Key words: Data cleaning; Keshan disease; Outcom assessment
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.