Abstract

With heterogeneous information networks becoming ubiquitous and complex, lots of data mining tasks have been explored, including clustering, collaborative filtering and link prediction. Similarity computation is a fundamental task required for many problems of data mining. Although a large amount of similarity measures are developed for assessing similarities in heterogeneous networks, they are usually dependent on the network schema and lack a general manner for integrating kinds of relationships between objects. In this paper, we propose a similarity measure, namely HeteRank, for generally computing similarities in heterogeneous information networks. The relationships between different type objects are represented by a general relationship matrix (GRM) that is built based on the scales of different type objects. Based on GRM, HeteRank fully integrates the multi-type relationships into similarity computation by utilizing all the meetings between objects. The HeteRank equation is further transformed into a simple binomial expression form with considering restart probability. For efficiently computing HeteRank similarities, we divide the similarity computation into two steps: the first step is to compute the intermediate values, and the second step is to compute the similarities based on intermediate values. And then we approximate HeteRank equation by setting thresholds for skipping lower intermediate values and similarity scores. A pruning algorithm is developed to reduce the unnecessary visits, multiplications and additions that make little contribution during similarity computation. Extensive experiments on real datasets demonstrate the effectiveness and efficiency of HeteRank through comparing with the state-of-the-art similarity measures.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.