Abstract

Some unusual combinations of predictor values in multivariate regression often influence tampering with the output, and filtering those observations becomes the trickiest and most challenging task. This concern is prevalent and predominant in ecological domains, especially in soil samples, as the data sets are heteroscedastic and heterogeneous. When there is little domain knowledge on the combinatorial criterion for the leverage points, it is advantageous to derive a labelled framework to differentiate the unusual observations. This study proposes a novel framework by integrating quantiles and proximity matrix of Quantile Regression Forest that builds a framework out of the training data set. Unlike other supervised anomalous detection algorithms, prior knowledge about the samples is not required to train the dataset, as the algorithm works in a self-learning mode. The outcome is two sets of observations: regular and leverage points. When unseen data arrives, the regressors’ proximity to these two observation sets is the demarcation criterion. Three real datasets are used, and the outcome of the proposed approach is verified using Principal Component Analysis, Local Outlier Factor, and Gaussian Mixture Models. The algorithm’s results are promising, setting a new trend of using supervised techniques without demanding any prior knowledge of the observations and performing an inlier-based outlier detection technique.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call