Abstract

Almost every researcher has come through observations that “drift” from the rest of the sample, suggesting some inconsistency. The aim of this paper is to propose a new inconsistent data detection method for continuous geospatial data based in Geostatistics, independently from the generative cause (measuring and execution errors and inherent variability data). The choice of Geostatistics is based in its ideal characteristics, as avoiding systematic errors, for example. The importance of a new inconsistent detection method proposal is in the fact that some existing methods used in geospatial data consider theoretical assumptions hardly attended. Equally, the choice of the data set is related to the importance of the LiDAR technology (Light Detection and Ranging) in the production of Digital Elevation Models (DEM). Thus, with the new methodology it was possible to detect and map discrepant data. Comparing it to a much utilized detections method, BoxPlot, the importance and functionality of the new method was verified, since the BoxPlot did not detect any data classified as discrepant. The proposed method pointed that, in average, 1,2% of the data of possible regionalized inferior outliers and, in average, 1,4% of possible regionalized superior outliers, in relation to the set of data used in the study.

Highlights

  • It is very likely that most researchers have encountered data in which some observations are very different from the rest, suggesting any number of issues: that the data are naturally or legitimately erratic, or that the data generating mechanism is not the same, or that the unusual data belong to another population

  • New methods for handling outliers have been developed to meet the demands of various areas of scientific knowledge, as in the case of (Hongxing, et al, 2001) for spatial data distributed in irregular grids, (Barua and Alhajj, 2007) for processing images, (Qiao et al, 2013) for data from satellites and (Appice et al, 2014) for geophysical data stream

  • Studying the detection of outliers is important because the first step in data analysis consists in the evaluation of data quality

Read more

Summary

Introduction

It is very likely that most researchers have encountered data in which some observations are very different from the rest, suggesting any number of issues: that the data are naturally or legitimately erratic, or that the data generating mechanism is not the same, or that the unusual data belong to another population These observations are considered to be inconsistent, commonly called outliers, or discrepant data. Many authors have contributed to this subject, such as (see, e.g., Anscombe, 1960; Grubbs, 1969; Beckman and Cook,1983, Rousseuw and Zomeren, 1990; Muñoz-Garcia et al, 1990; Barnett and Lewis,1994) among other pioneers Some of these authors assert that the concern about disparate data is as old as the first attempts at analysis of a set of data, as in the case of comments of Bernoulli in 1777 about the existence of such data. It is important to evaluate each observation in-depth, to discuss the impact of each in the analysis, and to consider the inclusion or exclusion of a given observation in the analysis during the outliers’ detection phase, in some situations all posterior action can be rejected because of the decision taken in the beginning of the data analysis (MuñozGarcia et al, 1990)

Objectives
Methods
Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.