Creation of language networks based on texts with using visibility graphs algorithms

Dmytro Lande,Oleh Dmytrenko

doi:10.20535/2411-1031.2018.6.2.153486

Abstract

A method to constructing language networks is proposed. Key words and concepts from the set of documents which describe some subject domain are retrieved. Numeric values are assigned to each word using a TF-IDF metric, that is intended to reflect how important a word is to a document in a collection or corpus. As the result a time series are constructed. A tool in time series analysis – the visibility graph algorithm is used for constructing the graph of subject domain. In this article two actual subject domains (“Space” and “Computer graphic”) are considered for example. The proposed method is used for the set of documents, which are related with “Space” and “Computer graphic”. A network of connections between terms and concepts, which go into textual documents is builded. Building networks of words, the nodes of which are elements of the text, enables to reveal key components of the text. At the same time, the task of determining the important structural elements of the text which are also informationally important, is actual. As a result of the research, it was found that such words as “uranium”, “nuclear”, “waste”, “Jupiter”, “Mercury”, “Moon”, “Earth”, “comet”, “space” and others are key for the subject area “Space”. This article shows that applying only a TF metric is more expedient compared with the TF-IDF metric in case when the set of documents describe one subject domain. Also the results of applying the visibility graphs algorithm and the compactified horizontal visibility graph algorithm are compared. It was found that in some case using the compactified horizontal visibility graph algorithm gives a network of words with more quantity of connections between concepts compared with using the visibility graphs algorithm. An open-source visualization and exploration software for all kinds of graphs and networks Gephi and an original package of specially developed Python modules are used for simulation and visualization as an additional tool. The proposed method can be used for visualization some subject domain, and also for information decision support systems, enabling to reveal key components of a subject domain. Also the results of this article can be used for building UI of information retrieval systems, enabling to make a process of search a relevant information easier.

Highlights

Пропонується метод створення мереж із текстів, так званих мереж слів (Language Network).
Використовуючи алгоритми побудови графів видимості як інструмент для аналізу часових рядів, між отриманими ключовими поняттями будується граф предметної області.
Для масиву заздалегідь вибраних текстових документів, тематично пов’язаних з поняттям космічного простору та комп’ютерної графіки, застосовуються алгоритми побудови графів видимості та будується мережа слів.

Summary

Introduction

Пропонується метод створення мереж із текстів, так званих мереж слів (Language Network). Використовуючи алгоритми побудови графів видимості як інструмент для аналізу часових рядів, між отриманими ключовими поняттями будується граф предметної області. Для масиву заздалегідь вибраних текстових документів, тематично пов’язаних з поняттям космічного простору та комп’ютерної графіки, застосовуються алгоритми побудови графів видимості та будується мережа слів. Також у роботі порівнюються результати застосування алгоритму побудови графів видимості з алгоритмом побудови компактифікованого графу горизонтальної видимості.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Creation of language networks based on texts with using visibility graphs algorithms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Collection "Information technology and security"

Lead the way for us

Journal: Collection "Information technology and security"	Publication Date: Dec 30, 2018
License type: cc-by

Similar Papers

Visibility graph for time series prediction and image classification: a review.
Tao Wen ... Huiling Chen
Nonlinear Dynamics | VOL. 110
Tao Wen, et. al.Tao Wen ... Huiling Chen
31 Oct 2022
Nonlinear Dynamics | VOL. 110

Small Target Detection in X-Band Sea Clutter Using the Visibility Graph
Simin Chen ... Chen Feng
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Simin Chen, et. al.Simin Chen ... Chen Feng
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

A generalized visibility graph algorithm for analyzing biological time series having rotation in polar plane
Zahra Ramezanpoor ... Ghasem Sadeghi Bajestani
Engineering Applications of Artificial Intelligence | VOL. 128
Zahra Ramezanpoor, et. al.Zahra Ramezanpoor ... Ghasem Sadeghi Bajestani
24 Nov 2023
Engineering Applications of Artificial Intelligence | VOL. 128

Research of short-term heart rate variability during sleep based on limited penetrable horizontal visibility graph
Huo Cheng-Yu ... Ning Xin-Bao
Acta Physica Sinica | VOL. 66
Huo Cheng-Yu, et. al. Huo Cheng-Yu ... Ning Xin-Bao
01 Jan 2017
Acta Physica Sinica | VOL. 66

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Creation of language networks based on texts with using visibility graphs algorithms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Collection "Information technology and security"