Understanding Short Texts through Semantic Enrichment and Hashing

Zheng Yu,Xuemin Lin,Haixun Wang,Min Wang

doi:10.1109/tkde.2015.2485224

Abstract

Clustering short texts (such as news titles) by their meaning is a challenging task. The semantic hashing approach encodes the meaning of a text into a compact binary code. Thus, to tell if two texts have similar meanings, we only need to check if they have similar codes. The encoding is created by a deep neural network, which is trained on texts represented by word-count vectors (bag-of-word representation). Unfortunately, for short texts such as search queries, tweets, or news titles, such representations are insufficient to capture the underlying semantics. To cluster short texts by their meanings, we propose to add more semantic signals to short texts. Specifically, for each term in a short text, we obtain its concepts and co-occurring terms from a probabilistic knowledge base to enrich the short text. Furthermore, we introduce a simplified deep learning network consisting of a 3-layer stacked auto-encoders for semantic hashing. Comprehensive experiments show that, with more semantic signals, our simplified deep learning model is able to capture the semantics of short texts, which enables a variety of applications including short text retrieval, classification, and general purpose text processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Understanding Short Texts through Semantic Enrichment and Hashing

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Feb 1, 2016
Citations: 75

Similar Papers

Understanding short texts through semantic enrichment and hashing
Zheng Yu ... Haixun Wang
-
Zheng Yu, et. al.Zheng Yu ... Haixun Wang
01 May 2016
01 May 2016

Short Text Entity Linking with Fine-grained Topics
Lihan Chen ... Chenhao Xie
-
Lihan Chen, et. al.Lihan Chen ... Chenhao Xie
17 Oct 2018
17 Oct 2018

Leveraging Knowledge-Based Features With Multilevel Attention Mechanisms for Short Arabic Text Classification
Iyad Alagha
IEEE Access | VOL. 10
Iyad AlaghaIyad Alagha
01 Jan 2021
IEEE Access | VOL. 10

Transferring topical knowledge from auxiliary long texts for short text clustering
Ou Jin ... Qiang Yang
-
Ou Jin, et. al.Ou Jin ... Qiang Yang
24 Oct 2011
24 Oct 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Understanding Short Texts through Semantic Enrichment and Hashing

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering