Abstract

Mining outliers in a heterogeneous information network is a challenging problem: It is even unclear what should be outliers in a large heterogeneous network (e.g., Outliers in the entire bibliographic network consisting of authors, titles, papers and venues). In this study, we propose an interesting class of outliers, query-based sub network outliers: Given a heterogeneous network, a user raises a query to retrieve a set of task-relevant sub networks, among which, sub network outliers are those that significantly deviate from others (e.g., Outliers of author groups among those studying "topic modeling"). We formalize this problem and propose a general framework, where one can query for finding sub network outliers with respect to different semantics. We introduce the notion of sub network similarity that captures the proximity between two sub networks by their membership distributions. We propose an outlier detection algorithm to rank all the sub networks according to their outlierness without tuning parameters. Our quantitative and qualitative experiments on both synthetic and real data sets show that the proposed method outperforms other baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.