Abstract

Deep learning techniques, particularly convolutional neural networks (CNNs), are poised for widespread application in the research fields of information retrieval and natural language processing. However, there are very few publications addressing semantic indexing with deep learning. In particular, there are few studies of semantic indexing in biomedical literature because of several specific challenges including a vast amount of semantic labels from automatically annotating MeSH terms for MEDLINE citations and a massive collection with only the title and abstract information. In this paper, we introduce a novel CNN-based semantic indexing method for biomedical abstract document collections. First, we adaptively group word2vec categories into (coarse) subsets by clustering. Next, we construct a high-dimensional space representation with Wikipedia category extension, which contains more semantic information than bag-of-words. Thereafter, we design a hierarchical CNN indexing architecture for learning documents from a coarse- to fine-grained level with several multi-label training techniques. We believe that the low-dimensional representation of the output layer in CNNs should be more compact and effective. Finally, we perform comparative experiments for semantic indexing of biomedical abstract documents. Experimental results on the MEDLINE dataset show that our model achieves superior performance than conventional models.

Highlights

  • Deep learning techniques, convolutional neural networks (CNNs), are poised for widespread application in the research fields of information retrieval and natural language processing

  • Semantic indexing [19, 21, 23] occupies an important position in document classification and information retrieval

  • Considering the numerous classes of the documents and the uneven distribution of samples, we introduce a hierarchical CNN-based framework (HC) to conduct biomedical document semantic indexing for both multiple labels and correlated labels

Read more

Summary

Introduction

Convolutional neural networks (CNNs), are poised for widespread application in the research fields of information retrieval and natural language processing. There are very few publications addressing semantic indexing with deep learning. Over the last several years, deep neural networks (DNNs) [9] have emerged as a powerful machine learning technology that has achieved tremendous success in image classification, speech recognition, and natural language processing (NLP) tasks by showing significant gains over state-of-the-art shallow learning. CNNs have become more popular than fully-connected DNNs. Semantic indexing [19, 21, 23] occupies an important position in document classification and information retrieval. The query and the document must be mapped in a low-dimensional space and effectively learning this representation is a

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.