Abstract

ObjectivesWe evaluated the error detection performance of the DetectDeviatingCells (DDC) algorithm which flags data anomalies at observation (casewise) and variable (cellwise) level in continuous variables. We compared its performance to other approaches in a simulated dataset. Study Design and SettingWe simulated height and weight data for hypothetical individuals aged 2–20 years. We changed a proportion of height values according to predetermined error patterns. We applied the DDC algorithm and other error-detection approaches (descriptive statistics, plots, fixed-threshold rules, classic, and robust Mahalanobis distance) and we compared error detection performance with sensitivity, specificity, likelihood ratios, predictive values, and receiver operating characteristic (ROC) curves. ResultsAt our chosen thresholds error detection specificity was excellent across all scenarios for all methods and sensitivity was higher for multivariable and robust methods. The DDC algorithm performance was similar to other robust multivariable methods. Analysis of ROC curves suggested that all methods had comparable performance for gross errors (e.g., wrong measurement unit), but the DDC algorithm outperformed the others for more complex error patterns (e.g., transcription errors that are still plausible, although extreme). ConclusionsThe DDC algorithm has the potential to improve error detection processes for observational data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call