Abstract

Document network is a kind of intriguing dataset which can provide both topical (textual content) and topological (relational link) information. A key point in viably modeling such datasets is to discover proper denominators beneath the two different types of data, text and link. Most previous work introduces the assumption that documents closely linked with each other share common latent topics. However, the heterophily (i.e., tendency to link to different others) of nodes is neglected, which is pervasive in social networks. In this paper, we simultaneously incorporate community detection and topic modeling in a unified framework, and appeal to Canonical Correlation Analysis (CCA) to capture the latent semantic correlations between the two heterogeneous latent factors, community and topic. Despite of the homophily (i.e., tendency to link to similar others) or heterophily, CCA can properly capture the inherent correlations which fit the dataset itself without any prior hypothesis. Logistic normal prior is also employed in modeling network to better capture the community correlations. We derive efficient inference and learning algorithms based on variational EM methods. The effectiveness of our proposed model is comprehensively verified on three different types of datasets which are namely hyperlinked networks of web pages, social networks of friends and coauthor networks of publications. Experimental results show that our approach achieves significant improvements on both topic modeling and community detection compared with the current state of the art. Meanwhile, our model is impressive in discovering correlations between extracted topics and communities.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.