Outlier detection is a crucial task for identifying unexpected patterns, errors, and behaviors; therefore, maximizing the valuable information obtained from ubiquitous, incomplete, redundant, noisy, and mixed data poses a great challenge. To achieve efficient graph-based outlier detection, we enhance the connectivity between similar objects and weaken the connectivity between heterogeneous objects. The network structure proposed in this paper is called “an incomplete local and global neighborhood information (ILGNI) network.” In this network, incomplete mixed data can be exploited considering two aspects; single-attribute local information and multi-attribute global information. Specifically, we initially utilize unsupervised attribute reduction methods to improve data quality. Then, from the perspective of local and global information, we use the level of similarity of objects to design strong-neighborhood and weak-similarity relations to deal with incomplete data. On this basis, the topology reconstruction of a high number of fine-grained neighborhood information networks can be realized. Finally, the outlier scores are calculated considering the stationary distribution of the Markov random walk model over the ILGNI network. The experiments conducted on four real-telecom fraud datasets demonstrate that the proposed algorithm can achieve enhanced outlier detection performance with low time complexity. In addition, the proposed method can effectively mine the information obtained from incomplete data and has high applicability to feature-related and feature-independent datasets.
Read full abstract