Abstract

Multiple observation sequences are collected, among which there is a small subset of outliers. A sequence is considered an outlier if the observations therein are generated by a mechanism different from that generating the observations in the majority of sequences. In the universal setting, the goal is to identify all the outliers without any knowledge about the underlying generating mechanisms. In prior work, this problem was studied as a universal hypothesis testing problem, and a generalized likelihood test was constructed and its asymptotic performance characterized. Here a connection is made between the generalized likelihood test and clustering algorithms from machine learning. It is shown that the generalized likelihood test is equivalent to combinatorial clustering over the probability simplex with the Kullback-Leibler divergence being the dissimilarity measure. Applied to synthetic data sets for outlier hypothesis testing, the performance of the generalized likelihood test is shown to be superior to that of a number of other clustering algorithms for sufficiently large sample sizes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call