Abstract

Detecting boundary points (including outliers) is often more interesting than detecting normal observations, since they represent valid, interesting, and potentially valuable patterns. Since data representation can uncover the intrinsic data structure, we present an efficient representation-based method for detecting such points, which are generally located around the margin of densely distributed data, such as a cluster. For each point, the negative components in its representation generally correspond to the boundary points among its affine combination of points. In the presented method, the reverse unreachability of a point is proposed to evaluate to what degree this observation is a boundary point. The reverse unreachability can be calculated by counting the number of zero and negative components in the representation. The reverse unreachability explicitly takes into account the global data structure and reveals the disconnectivity between a data point and other points. This paper reveals that the reverse unreachability of points with lower density has a higher score. Note that the score of reverse unreachability of an outlier is greater than that of a boundary point. The top- ranked points can thus be identified as outliers. The greater the value of the reverse unreachability, the more likely the point is a boundary point. Compared with related methods, our method better reflects the characteristics of the data, and simultaneously detects outliers and boundary points regardless of their distribution and the dimensionality of the space. Experimental results obtained for a number of synthetic and real-world data sets demonstrate the effectiveness and efficiency of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call