Learning document semantic representation with hybrid deep belief network.

Yan Yan,Sujian Li,Mingyuan Yang,Hong-Wei Hao,Xu-Cheng Yin

doi:10.1155/2015/650527

Abstract

High-level abstraction, for example, semantic representation, is vital for document classification and retrieval. However, how to learn document semantic representation is still a topic open for discussion in information retrieval and natural language processing. In this paper, we propose a new Hybrid Deep Belief Network (HDBN) which uses Deep Boltzmann Machine (DBM) on the lower layers together with Deep Belief Network (DBN) on the upper layers. The advantage of DBM is that it employs undirected connection when training weight parameters which can be used to sample the states of nodes on each layer more successfully and it is also an effective way to remove noise from the different document representation type; the DBN can enhance extract abstract of the document in depth, making the model learn sufficient semantic representation. At the same time, we explore different input strategies for semantic distributed representation. Experimental results show that our model using the word embedding instead of single word has better performance.

Highlights

Semantic representation [1,2,3] is very important in document classification and document retrieval tasks
Considering the limitations of DBN and DBM, especially for document representation, in this paper, we propose Hybrid Deep Belief Network (HDBN) which uses the Deep Boltzmann Machines model composed of simple twolayer Restricted Boltzmann Machines (RBMs) in the lower layers and Deep Belief Networks model made up of two-layer RBMs in the upper layers as we take both training time and the model accuracy into consideration for document classification and retrieval tasks
We explored the effects of different input on our HDBN model for extracting semantic information

Summary

Introduction

Semantic representation [1,2,3] is very important in document classification and document retrieval tasks. LSI [4] and pLSI [5] are two kinds of dimension reduction methods which use SVD (Singular Value Decomposition) to operate on a document vector matrix and remap it in a smaller semantic space than the original one This method can still only capture very limited relations between words. Blei et al [6] proposed Latent Dirichlet Allocation (LDA) that can extract some document topics which has shown superior performance over LSI and pLSI. This method is popular in the field of topic model; in the meantime, it is considered a great method for reducing dimensions. This method has some disadvantages: semantic features of the study are not sufficient for the documents, exact inferences in the directed model are intractable [7, 8], and it cannot properly deal with documents of different lengths

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Intelligence and Neuroscience	Publication Date: Jan 1, 2015
Citations: 39	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Learning document semantic representation with hybrid deep belief network.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience

Lead the way for us

Similar Papers

Intermediate Progenitor Cohorts Differentially Generate Cortical Layers and Require Tbr2 for Timely Acquisition of Neuronal Subtype Identity.
Anca B Mihalas ... Robert F Hevner
Cell Reports | VOL. 16
Anca B Mihalas, et. al.Anca B Mihalas ... Robert F Hevner
01 Jun 2016
Cell Reports | VOL. 16

Influence of Pinus pinaster age on aluminium fractions in acidic soils
Cristina Eimil-Fraga ... Esperanza Álvarez-Rodríguez
Spanish Journal of Soil Science | VOL. 10
Cristina Eimil-Fraga, et. al.Cristina Eimil-Fraga ... Esperanza Álvarez-Rodríguez
02 Jul 2020
Spanish Journal of Soil Science | VOL. 10

CCWI2017: F29 'APPLYING DEEP LEARNING WITH EXTENDED KALMAN FILTER AND GENETIC ALGORITHM OPTIMIZATION FOR WATER DISTRIBUTION DATA-DRIVEN MODELING'
...
-
, et. al. ...
01 Sep 2017
CCWI2017: F29 'APPLYING DEEP LEARNING WITH EXTENDED KALMAN FILTER AND GENETIC ALGORITHM OPTIMIZATION FOR WATER DISTRIBUTION DATA-DRIVEN MODELING'
...

Towards a semantic-based approach for software reusable component classification and retrieval
Haining Yao ... Letha Etzkorn
-
Haining Yao, et. al.Haining Yao ... Letha Etzkorn
02 Apr 2004
02 Apr 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning document semantic representation with hybrid deep belief network.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Intelligence and Neuroscience