Abstract

AbstractAn outlier is generally considered as a data point that deviates from the “bulk” of all the data points. For outlier diagnosis, two questions could be asked: (1) How far is an object from the bulk? and (2) how many data points do the “bulk” include? To simultaneously deal with the above two questions, the h‐outlyingness index (HOI) is defined as suppose a given data point in a data set of N data points, if at most M% of all the (N − 1) one‐to‐rest distances is no less than M% of all the N(N − 1)/2 pairwise distances, the HOI value for the given data point will be M%. For applications, HOI was used for outlier diagnosis in simulated and real data sets, and the results were compared with those obtained by some robust statistical methods. Compared with the traditional methods, HOI gained similar results. For high‐dimensional data, it was wise to compute HOI based on dimension reduction methods such as principal component analysis (PCA). HOI was demonstrated to be a simple, easy‐to‐compute, robust and effective index for outlier diagnosis. Moreover, HOI is a nonparametric method that has no underlying assumptions on data distribution, which will be useful in chemometrics for multivariate outlier diagnosis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.