A text representation model using Sequential Pattern-Growth method

Suraya Alias,Gan Keng Hoon,Siti Khaotijah Mohammad,Tan Tien Ping

doi:10.1007/s10044-017-0624-9

Abstract

Text representation is an essential task in transforming the input from text into features that can be later used for further Text Mining and Information Retrieval tasks. The commonly used text representation model is Bags-of-Words (BOW) and the N-gram model. Nevertheless, some known issues of these models, which are inaccurate semantic representation of text and high dimensionality of word size combination, should be investigated. A pattern-based model named Frequent Adjacent Sequential Pattern (FASP) is introduced to represent the text using a set of sequence adjacent words that are frequently used across the document collection. The purpose of this study is to discover the similarity of textual pattern between documents that can be later converted to a set of rules to describe the main news event. The FASP is based on the Pattern-Growth’s divide-and-conquer strategy where the main difference between FASP and the prior technique is in the Pattern Generation phase. This approach is tested against the BOW and N-gram text representation model using Malay and English language news dataset with different term weightings in the Vector Space Model (VSM). The findings demonstrate that the FASP model has a promising performance in finding similarities between documents with the average vector size reduction of 34% against the BOW and 77% against the N-gram model using the Malay dataset. Results using the English dataset is also consistent, indicating that the FASP approach is also language independent.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A text representation model using Sequential Pattern-Growth method

Abstract

Talk to us

Similar Papers

More From: Pattern Analysis and Applications

Lead the way for us

Journal: Pattern Analysis and Applications	Publication Date: Jun 1, 2017
Citations: 9

Similar Papers

A novel semi-supervised learning framework with simultaneous text representing
Yan Zhu ... Jian Yu
Knowledge and Information Systems | VOL. 34
Yan Zhu, et. al.Yan Zhu ... Jian Yu
31 Mar 2012
Knowledge and Information Systems | VOL. 34

위키피디어 기반 개념 공간을 가지는 시멘틱 텍스트 모델
Han-Joon Kim ... Jae-Young Chang
The Journal of Society for e-Business Studies | VOL. 19
Han-Joon Kim, et. al.Han-Joon Kim ... Jae-Young Chang
31 Aug 2014
The Journal of Society for e-Business Studies | VOL. 19

Analysis of Changing Trends in Textual Data Representation
Ksh. Nareshkumar Singh ... A. Dorendro
-
Ksh. Nareshkumar Singh, et. al.Ksh. Nareshkumar Singh ... A. Dorendro
01 Jan 2020
01 Jan 2020

Bag of textual graphs (BoTG): A general graph‐based text representation model
Ícaro Cavalcante Dourado ... Ricardo Da Silva Torres
Journal of the Association for Information Science and Technology | VOL. 70
Ícaro Cavalcante Dourado, et. al.Ícaro Cavalcante Dourado ... Ricardo Da Silva Torres
13 Jan 2019
Journal of the Association for Information Science and Technology | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A text representation model using Sequential Pattern-Growth method

Abstract

Talk to us

Similar Papers

More From: Pattern Analysis and Applications