Abstract
Outlier detection is a crucial task for identifying unexpected patterns, errors, and behaviors; therefore, maximizing the valuable information obtained from ubiquitous, incomplete, redundant, noisy, and mixed data poses a great challenge. To achieve efficient graph-based outlier detection, we enhance the connectivity between similar objects and weaken the connectivity between heterogeneous objects. The network structure proposed in this paper is called “an incomplete local and global neighborhood information (ILGNI) network.” In this network, incomplete mixed data can be exploited considering two aspects; single-attribute local information and multi-attribute global information. Specifically, we initially utilize unsupervised attribute reduction methods to improve data quality. Then, from the perspective of local and global information, we use the level of similarity of objects to design strong-neighborhood and weak-similarity relations to deal with incomplete data. On this basis, the topology reconstruction of a high number of fine-grained neighborhood information networks can be realized. Finally, the outlier scores are calculated considering the stationary distribution of the Markov random walk model over the ILGNI network. The experiments conducted on four real-telecom fraud datasets demonstrate that the proposed algorithm can achieve enhanced outlier detection performance with low time complexity. In addition, the proposed method can effectively mine the information obtained from incomplete data and has high applicability to feature-related and feature-independent datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.