Abstract

Outlier detection on data with missing information values is especially tricky because the uncertainty caused by missing information values may contribute to an object being an outlier. A multiset-valued information system (MSVIS) is an information system (IS) in which information values are multisets. This kind of IS is a useful way of handling datasets with missing information values. In this paper, we study outlier detection in an MSVIS based on rough set theory and granular computing. First, some concepts of multisets and probability distribution sets are reviewed, and the fact that a weak one-to-one correspondence exists between multisets and rational probability distribution sets is illustrated. In this way, multisets may be treated as rational probability distribution sets. Then, an MSVIS can be induced by an incomplete information system (IIS) and viewed as the result of information fusion of multiple categorical ISs. Next, a tolerance relation in an MSVIS is constructed with the induced rational probability distribution sets. Then, the outlier factor in an MSVIS is formulated, and the corresponding outlier detection algorithm is proposed. Finally, the performance evaluation by AUC (area under the curve) and F1-score shows the superiority of the proposed algorithm over some existing algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call