Abstract

In outlier hypothesis testing, multiple observation sequences are collected, a small subset of which are outliers. Observations in an outlier sequence are generated by a mechanism different from that generating the observations in the majority of sequences. The goal is to best discern all the outlier sequences without any knowledge of the underlying generating mechanisms. A generalized likelihood test is considered in the fixed sample size setting. In the sequential setting, a test based on the Multihypothesis Sequential Probability Ratio Test and the repeated significance test is considered. The sequential test outperforms the generalized likelihood test when the lengths of the observation sequences exceed certain values. Applied to a real data set for spam detection, the performance of the proposed tests is shown to be superior to those based on the maximum mean discrepancy for large sample size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.