Abstract

Graph-based text representation is one of the important preprocessing steps in data and text mining, Natural Language Processing (NLP), and information retrieval approaches. The graph-based methods focus on how to represent text documents in the shape of a graph to exploit the best features of their characteristics. This study reviews and lists the advantages and disadvantages of such methods employed or developed in graph-based text representations. The literature shows that some of the proposed graph-based methods suffer from a lack of representing texts in certain situations. Currently, several techniques are commonly used in graph-based text representation. However, there are still some weaknesses and shortages in these techniques and tools that significantly affect the success of graph representation and graph matching. In this review, we conduct an inclusive survey of the state of the art in graph-based text representation and learning. We provide a formal description of the problem of graph-based text representation and introduce some basic concepts. More significantly, this study proposes a new taxonomy of graph-based text representation, categorizing the existing studies based on representation characteristics and scheme techniques. In terms of the representation scheme taxonomy, we introduce four main types of conceptual graph schemes and summarize the challenges faced in each scheme. The main issues of graph representation, such as research topics and the sub-taxonomy of graph models for web documents, are introduced and categorized. This research also covers some tasks of understanding natural language processing (NLP) that depend on different types of graph structures. In addition, the graph matching taxonomy implements three main categories based on the matching approach, including structural-, semantic-, and similarity-based approaches. Moreover, a deep comparison of these approaches is discussed and reported in terms of methods and tools, the concepts of matching and locality, and the application domains that use these tools. Finally, the paper recommends seven promising future study directions in the graph-based text representation field. These recommendation points are summarized and highlighted as open problems and challenges of graph-based text representation and learning to facilitate and fill the research gaps for scientific researchers in this field.

Highlights

  • The website has been a significant source of knowledge on every subject or domain in recent years

  • While most of the studies we reviewed are extremely scalable in graph theory (i.e., V(|E|) representation), there is still an important study to be done in scaling vertex and graph representation methods to truly massive text documents

  • The survey provides basic definitions of the structure of graphbased text representations and proposes a new taxonomy for the main issues related to graph-based text representation

Read more

Summary

INTRODUCTION

The website has been a significant source of knowledge on every subject or domain in recent years. The algorithms that depend on the graph showed that the experimental results obtained by [2] were better in a specific document of methods based on the BOW model This approach can define causal relationships and improve the execution of the textual similarity steps. Text corpus is known as a marked guided graph with words as nodes, while edges indicate the syntactic relationship between words They proposed a new path constrained graph walking approach where high-level information about important sequences directs the process of graph walking. C. GRAPH-BASED REPRESENTATION IN NATURAL LANGUAGE PROCESSING Some tasks of understanding natural language processing (NLP) depend on different types of structures of graphs, for example, word co-occurrence graphs, word-document graphs, sentences as graphs, and knowledge graphs.

OPEN PROBLEMS AND RESEARCH GAPS
Findings
CONCLUSIONS AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call