Outlier detection for multinomial data with a large number of categories

Xiaona Yang,Zhaojun Wang,Xuemin Zi

doi:10.1142/s2010326320500082

Abstract

This paper develops an outlier detection procedure for multinomial data when the number of categories tends to infinity. Most of the outlier detection methods are based on the assumption that the observations follow multivariate normal distribution, while in many modern applications, the observations either are measured on a discrete scale or naturally have some categorical structures. For such multinomial observations, there are rather limited approaches for outlier detection. To overcome the main obstacle, the least trimmed distances estimator for multinomial data and a fast algorithm to identify the clean subset are introduced in this work. Also, a threshold rule is considered through the asymptotic distribution of measure distance to identify outliers. Furthermore, a one-step reweighting scheme is proposed to improve the efficiency of the procedure. Finally, the finite sample performance of our method is evaluated through simulations and is compared with that of available outlier detection methods.

Full Text