Abstract
With the explosively increasing of academic papers, efficient academic document retrieval is becoming an essential requirement for large-scale information retrieval systems. Inspired by the success of deep semantic hashing in normal document retrieval, deep semantic hashing is a promising approach for academic document retrieval by mapping academic documents into efficient hash codes. However, for academic document retrieval, the existing deep semantic hashing methods suffer from following two problems: (1) they cannot differentiate the importance of different field labels; (2) they cannot plenty utilize the structure information in paper citations. To address these problems, we propose a novel Large-scale Academic deep Semantic Hashing, called LASH. Specifically, LASH first treats paper citations as a citation network, and then employs a multi-input variational deep autoencoder to directly encode both structure information of the citation network and semantic information of academic documents into unified hash codes. Moreover, a weighted percentage similarity is designed to measure the importance of different field labels, which is a linear combination of Jaccard and Cosine similarity. Supervised by the similarity, the learned unified hash codes can further preserve the importance of different field labels. Extensive experiments show LASH significantly outperforms state-of-the-art baselines over proposed three real-world large-scale academic datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.