Text Classification Using the N-Gram Graph Representation Model Over High Frequency Data Streams

John Violos,Theodora Varvarigou,Iraklis Varlamis,Konstantinos Tserpes

doi:10.3389/fams.2018.00041

John Violos, Theodora Varvarigou + Show 2 more

Open Access

PDF Available

https://doi.org/10.3389/fams.2018.00041

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

A prominent challenge in our information age is the classification over high frequency data streams. In this research, we propose an innovative and high-accurate text stream classification model that is designed in an elastic distributed way and is capable to service text load with fluctuated frequency. In this classification model, text is represented as N-Gram Graphs and the classification process takes place using text preprocessing, graph similarity and feature classification techniques following the supervised machine learning approach. The work involves the analysis of many variations of the proposed model and its parameters, such as various representations of text as N-Gram Graphs, graph comparisons metrics and classification methods in order to conclude to the most accurate setup. To deal with the scalability, the availability and the timely response in case of high frequency text we employ the Beam programming model. Using the Beam programming model the classification process occurs as a sequence of distinct tasks and facilitates the distributed implementation of the most computational demanding tasks of the inference stage. The proposed model and the various parameters that constitute it are evaluated experimentally and the high frequency stream emulated using two public datasets (20NewsGroup and Reuters-21578) that are commonly used in the literature for text classification

Highlights

Text classification is a supervised machine learning technique that is being frequently used in the context of many applications such as event detection [1] and sentiment analysis [2]
From the text classification perspective of our research, we divided the dataset texts to the training and testing parts and carried out the experiments according to the 10-fold cross validations
N-gram graphs is a representation model that has been used in other machine learning techniques and it was a challenge to be extended for text streaming generated at high speed and classified in real time

Summary

Introduction

Text classification is a supervised machine learning technique that is being frequently used in the context of many applications such as event detection [1] and sentiment analysis [2]. Text streams typically generate continuously small size texts, which can be sent simultaneously or frequently to a subscriber who performs a continuous, low-latency processing on them. In this context, a single node classification approach can become a bottleneck under real time requirements, and distributed solutions or novel data models and algorithms are preferred at the expense of traditional approaches that assume fixed-size, historical datasets. The majority of the applications processing text streams are subjected to the following four main constraints: Single-pass of observations, real-time response, bounded memory and concept drift as defined by Nguyen et al [3]. In this research we propose a streaming text classification method that uses the n-gram graph representation model and designed with

Objectives

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Applied Mathematics and Statistics	Publication Date: Sep 11, 2018
Citations: 16	License type: CC BY 4.0

R Discovery Prime

Text Classification Using the N-Gram Graph Representation Model Over High Frequency Data Streams

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Frontiers in Applied Mathematics and Statistics

Lead the way for us

Similar Papers

Research on Sentiment Classification of Online Travel Review Text
Wen Chen ... Xiaoyao Zheng
Applied Sciences | VOL. 10
Wen Chen, et. al.Wen Chen ... Xiaoyao Zheng
30 Jul 2020
Applied Sciences | VOL. 10

A Study of Text Representations for Hate Speech Detection
Chrysoula Themeli ... George Giannakopoulos
-
Chrysoula Themeli, et. al.Chrysoula Themeli ... George Giannakopoulos
01 Jan 2023
01 Jan 2023

Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers
Marco Siino ... Marco La Cascia
Information Systems | VOL. 121
Marco Siino, et. al.Marco Siino ... Marco La Cascia
23 Dec 2023
Information Systems | VOL. 121

Dynamic Functional Connectivity Captures Individuals’ Unique Brain Signatures
Rohan Gandhi ... Petri Toiviainen
-
Rohan Gandhi, et. al.Rohan Gandhi ... Petri Toiviainen
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Text Classification Using the N-Gram Graph Representation Model Over High Frequency Data Streams

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Frontiers in Applied Mathematics and Statistics