On Bi-gram Graph Attributes

Thomas Konstantinovsky,Matan Mizrachi

doi:10.5539/cis.v14n3p78

Abstract

We propose a new approach to text semantic analysis and general corpus analysis using, as termed in this article, a &quot;bi-gram graph&quot; representation of a corpus. The different attributes derived from graph theory are measured and analyzed as unique insights or against other corpus graphs, attributes such as the graph chromatic number and the graph coloring, graph density and graph K-core. We observe a vast domain of tools and algorithms that can be developed on top of the graph representation; creating such a graph proves to be computationally cheap, and much of the heavy lifting is achieved via basic graph calculations. Furthermore, we showcase the different use-cases for the bi-gram graphs and how scalable it proves to be when dealing with large datasets.

Highlights

Corpus representation is central to natural language processing
The K-core dimensionality reduction and noise reduction approach we proposed in section 3.2 demonstrated below shows outstanding results when used in classification-based machine learning pipelines
In the example shown below, a corpus of spam and ham SMSes1 was taken as a classic example for text classification in machine learning natural language processing; the corpus was cleared of stopwords and converted into a bag of words representation

Summary

Introduction

Corpus representation is central to natural language processing. This paper highlights the benefits and use cases of a representation based on inner word relationships derived from the bi-grams of a given corpus. Previous works that used similar methods revolve around solving a specific problem using a graph representation (Masséet al, 2008), suggested a new way for grounding the meanings of certain words in sensorimotor categories (Reimer and Hahn, 1988), proposed a model of knowledge-based text condensation that resembles today's well-known knowledge-graphs. Many graph attributes are left untouched in natural language processing due to the different representations available. Previous works highlighted the benefits of N-gram flexibility with the well-structured representation of directed graphs and their applications towards classification problems, (Violos et al, 2018)

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On Bi-gram Graph Attributes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science

Lead the way for us

Journal: Computer and Information Science	Publication Date: Jul 29, 2021
License type: CC BY 4.0

Similar Papers

Star chromatic numbers of some planar graphs
Guogang Gao ... Huishan Zhou
Journal of Graph Theory | VOL. 27
Guogang Gao, et. al.Guogang Gao ... Huishan Zhou
01 Jan 1998
Journal of Graph Theory | VOL. 27

Colouring graphs on surfaces
Joan P Hutchinson
-
Joan P HutchinsonJoan P Hutchinson
09 Jul 2009
09 Jul 2009

The chromatic number of dense random graphs
Annika Heckel
Random Structures & Algorithms | VOL. 53
Annika HeckelAnnika Heckel
24 Jan 2018
Random Structures & Algorithms | VOL. 53

Nowhere-zero 3-flows of graphs with prescribed sizes of odd edge cuts
Rong Luo ... Cun-Quan Zhang
European Journal of Combinatorics | VOL. 36
Rong Luo, et. al.Rong Luo ... Cun-Quan Zhang
20 Aug 2013
European Journal of Combinatorics | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Bi-gram Graph Attributes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer and Information Science