Improvement of Text Data Storage Methods

Vasyl Lytvyn,Maria Talakh,Dmytro Uhryn,Artem Kalancha

doi:10.23939/sisn2024.15.102

Abstract

In this research, an analysis of the qualitative characteristics of messages in the Telegram messenger was carried out, which are used as raw data for further analysis of textual content. A thorough review of the parameters of these messages, such as their format, size, presence of noise, and speed. The main goal of the article is to model the optimal approach to saving a large amount of data before the important stage of text analysis. During the research, a detailed analysis of literary sources devoted to this topic was carried out. The article examines the main advantages and disadvantages of existing data preprocessing algorithms, as well as problems related to data purity and their impact on potential research results. As part of the software experiments, the impact of data preprocessing on the size of the saved data for further use, as well as on the speed of input data generation, was evaluated. Among the proposed methods, the method of saving cleared tokens in string format and the method of saving word codes in string format together with the word-code dictionary were highlighted. This is aimed at ensuring the effective distribution of tasks of the text analysis system during the period of the day.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improvement of Text Data Storage Methods

Abstract

Talk to us

Similar Papers

More From: Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì

Lead the way for us

Similar Papers

The Features of Contemporary Communication in Student Online Communities on the Example of Kostroma State University
Alexander Vladimirovich Sokolov ... Svetlana Vladimirovna Mironova
Moscow University Bulletin. Series 12. Political Science | VOL. -
Alexander Vladimirovich Sokolov, et. al.Alexander Vladimirovich Sokolov ... Svetlana Vladimirovna Mironova
28 Nov 2023
Moscow University Bulletin. Series 12. Political Science | VOL. -

ФАНДРАЙЗИНГ РЕДАКЦІЙ МАСМЕДІА ТА ДОПОМОГА ЗСУ: АНАЛІЗ ОГОЛОШЕНЬ ПРО ФІНАНСОВІ ЗБОРИ НА СТОРІНКАХ ЗМІ В ТЕЛЕГРАМІ ПІД ЧАС ПОВНОМАСШТАБНОГО ВТОРГНЕННЯ РОСІЇ В УКРАЇНУ
I Mudra ... N Voitovych
State and Regions. Series: Social Communications | VOL. -
I Mudra, et. al.I Mudra ... N Voitovych
07 Dec 2023
State and Regions. Series: Social Communications | VOL. -

Effectiveness of the use of Integrated Project Based Learning model, Telegram messenger, and plagiarism checker on learning outcomes
H Pratama ... I Prastyaningrum
Journal of Physics: Conference Series | VOL. 1171
H Pratama, et. al.H Pratama ... I Prastyaningrum
01 Feb 2019
Journal of Physics: Conference Series | VOL. 1171

The impact of the COVID-19 pandemic on the financial sector in medicine: features of centralized state procurement of drugs and medical devices in Ukraine for the period 2018-2021
O.S Denysov
Infusion & Chemotherapy | VOL. -
O.S DenysovO.S Denysov
24 Sep 2021
Infusion & Chemotherapy | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improvement of Text Data Storage Methods

Abstract

Talk to us

Similar Papers

More From: Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì