A Wikipedia-based semantic tensor space model for text analytics

Han Joon Kim,Jae Young Chang

doi:10.1504/ijcvr.2021.115159

Abstract

This paper proposes a third-order tensor space model that represents textual documents, which contains the 'concept' space independently of the 'document' and 'term' spaces. In the vector space model (VSM), a document is represented as a vector in which each dimension corresponds to a term. In contrast, the model described here represents a document as a matrix. Most current text mining algorithms only take vectors as their input, but they suffer from 'term independence' and 'loss of term senses' issues. To overcome these problems, we incorporate the 'concept' as a distinct space in the VSM. For this, it is necessary to produce the concept vector for each term that occurs in a given document, which is related to word sense disambiguation. As an external knowledge source for concept weighting, we employ the Wikipedia Encyclopedia, which has been evaluated as world knowledge and used to improve many text-mining algorithms. Through experiments using two popular document corpora, we demonstrate the superiority of the model in terms of text clustering and text classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Wikipedia-based semantic tensor space model for text analytics

Abstract

Talk to us

Similar Papers

More From: International Journal of Computational Vision and Robotics

Lead the way for us

Similar Papers

A Wikipedia-based semantic tensor space model for text analytics
Jae Young Chang ... Han Joon Kim
International Journal of Computational Vision and Robotics | VOL. 11
Jae Young Chang, et. al.Jae Young Chang ... Han Joon Kim
01 Jan 2020
International Journal of Computational Vision and Robotics | VOL. 11

위키피디어 기반 개념 공간을 가지는 시멘틱 텍스트 모델
Han-Joon Kim ... Jae-Young Chang
The Journal of Society for e-Business Studies | VOL. 19
Han-Joon Kim, et. al.Han-Joon Kim ... Jae-Young Chang
31 Aug 2014
The Journal of Society for e-Business Studies | VOL. 19

Local Relevance Weighted Maximum Margin Criterion for Text Classification
Quanquan Gu ... Jie Zhou
-
Quanquan Gu, et. al.Quanquan Gu ... Jie Zhou
30 Apr 2009
30 Apr 2009

An improved focused crawler based on Semantic Similarity Vector Space Model
Yajun Du ... Guoli Peng
Applied Soft Computing | VOL. 36
Yajun Du, et. al.Yajun Du ... Guoli Peng
01 Aug 2015
Applied Soft Computing | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Wikipedia-based semantic tensor space model for text analytics

Abstract

Talk to us

Similar Papers

More From: International Journal of Computational Vision and Robotics