Knowledge-driven graph similarity for text classification

Niloofer Shanavas,Hui Wang,Glenn Hawe,Zhiwei Lin

doi:10.1007/s13042-020-01221-4

Niloofer Shanavas, Hui Wang + Show 2 more

Open Access

https://doi.org/10.1007/s13042-020-01221-4

Copy DOI

Abstract

Automatic text classification using machine learning is significantly affected by the text representation model. The structural information in text is necessary for natural language understanding, which is usually ignored in vector-based representations. In this paper, we present a graph kernel-based text classification framework which utilises the structural information in text effectively through the weighting and enrichment of a graph-based representation. We introduce weighted co-occurrence graphs to represent text documents, which weight the terms and their dependencies based on their relevance to text classification. We propose a novel method to automatically enrich the weighted graphs using semantic knowledge in the form of a word similarity matrix. The similarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphs ensures that the graph kernel goes beyond exact matching of terms and patterns to compute the semantic similarity of documents. In the experiments on sentiment classification and topic classification tasks, our knowledge-driven similarity measure significantly outperforms the baseline text similarity measures on five benchmark text classification datasets.

Highlights

Research on automatic text classification has gained importance due to the information overload problem and the need for faster and more accurate extraction of knowledge from huge data sources
Graph-based representations of text are effective for text classification as they can model the structural information in text, which is required to understand its meaning
We focused on building a text graph model that represents the structural information in text effectively, which helps to compare documents based on their main similar content

Summary

Introduction

Research on automatic text classification has gained importance due to the information overload problem and the need for faster and more accurate extraction of knowledge from huge data sources. Bag-of-words is the most commonly used text representation scheme and is based on term independence assumption, where a text document is regarded as a set of unordered terms and is represented as a vector. We use an edge walk graph kernel to utilise the information in the enriched weighted graphs for calculating the similarity between text documents. The kernel function takes as input a pair of weighted co-occurrence graphs and gives as output a similarity value based on matching relevant content of the text documents. The novel contributions made in this paper are (1) the proposed weighting of the graph, (2) the automatic enrichment of graphs and (3) the application of the new graph-based text representation to build the knowledge-driven similarity measure.

Related work

Proposed weighted co‐occurrence graph representation

Automatic enrichment of graphs

Node enrichment

Edge enrichment

Example to illustrate node enrichment and edge enrichment

Graph kernels for measuring document similarity

Graph kernel‐based text classification pipeline

Experiments and results

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Machine Learning and Cybernetics	Publication Date: Nov 19, 2020
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Knowledge-driven graph similarity for text classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Machine Learning and Cybernetics

Lead the way for us

Similar Papers

Supervised graph-based term weighting scheme for effective text classification
...
-
, et. al. ...
29 Aug 2016
29 Aug 2016

Research On Text Classification Based On Deep Neural Network
Deageon Kim
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14
Deageon KimDeageon Kim
31 Dec 2022
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14

Structure-Based Supervised Term Weighting and Regularization for Text Classification
Niloofer Shanavas ... Hui Wang
-
Niloofer Shanavas, et. al.Niloofer Shanavas ... Hui Wang
01 Jan 2019
01 Jan 2019

Some Investigations on Machine Learning Techniques for Automated Text Categorization
Bhagirath Prajapati ... Sanjay Garg
International Journal of Computer Applications | VOL. 71
Bhagirath Prajapati, et. al.Bhagirath Prajapati ... Sanjay Garg
26 Jun 2013
International Journal of Computer Applications | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledge-driven graph similarity for text classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Machine Learning and Cybernetics