A theoretically proposed algorithm in a decision tree format for choosing an efficient storage type of large datasets

Sofiia Materynska,Vadym Yaremenko,Walery Rogoza

doi:10.15587/2706-5448.2022.251281

Abstract

The object of research is methods and approaches to improve storage efficiency and optimize access to large amounts of data. The importance of this study consists in the wide dissemination of big data and the need for the right selection of technologies that will help improve the efficiency of big data processing systems. The complexity of the choice is caused by the large number of different data storages and databases that are available now, so the best decision requires a deep understanding of the advantages, disadvantages and features of each. And the difficulty lies in the lack of a universal algorithm for deciding on the optimal repository. Accordingly, based on the experiments, analysis of existing projects and research papers, a decision-making algorithm was proposed that determines the best way to store large datasets, depending on their characteristics and additional system requirements. This is necessary to simplify the design of the system in the early stages of big data processing projects. Thus, by highlighting the key differences, as well as the disadvantages and advantages of each type of storage and database, a list of key characteristics of the data and the future system, which should be considered when designing. This algorithm is a theoretical proposal based on the studied research papers. Accordingly, using this algorithm at the design stage of the system, it would be possible to quickly and clearly determine the optimal type of storage of large datasets. The paper considers column-oriented, document-oriented, graph and key-value types of databases, as well as distributed file systems and cloud services.

Highlights

There is a significant increase in the amount of data produced by large companies, enterprises, governments, and ordinary people
One of the main problems with application of big data technologies is data storage [1], as in addition to storing data in a certain form, it is necessary to provide it with appropriate access to meet the needs of the stages of use of this data
After pre-processing and cleaning raw data intended for machine learning, artificial intelligence processing or analytics, it is necessary to place it in a specific data storage or database that will be most effective in a particular situation [2]

Summary

Introduction

There is a significant increase in the amount of data produced by large companies, enterprises, governments, and ordinary people This data has different sources of origin and is related to different fields: healthcare, economics, marketing, business, and others. Such huge amounts of data are the result of expanding the use of social networks, electronic devices, and other information technologies. One of the main problems with application of big data technologies is data storage [1], as in addition to storing data in a certain form, it is necessary to provide it with appropriate access to meet the needs of the stages of use of this data. The aim of this research is a comparative analysis of big data storage methods and creation of a decision support system to determine the most appropriate method

Research methodology

Research results and discussion

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A theoretically proposed algorithm in a decision tree format for choosing an efficient storage type of large datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Technology audit and production reserves

Lead the way for us

Journal: Technology audit and production reserves	Publication Date: Jan 19, 2022
License type: CC BY 4.0

Similar Papers

Navigating the early stages of a large sustainability-oriented rural tourism development project: Lessons from Træna, Norway
Deodat Mwesiumo ... Jon Halfdanarson
Tourism Management | VOL. 89
Deodat Mwesiumo, et. al.Deodat Mwesiumo ... Jon Halfdanarson
01 Apr 2022
Tourism Management | VOL. 89

A high-level dynamic analysis approach for studying global process plant availability and production time in the early stages of mining projects
Dennis Travagini Cremonese ... Giorgio De Tomi
REM - International Engineering Journal | VOL. 70
Dennis Travagini Cremonese, et. al.Dennis Travagini Cremonese ... Giorgio De Tomi
01 Jun 2017
REM - International Engineering Journal | VOL. 70

Development of a Model for Predicting Probabilistic Life-Cycle Cost for the Early Stage of Public-Office Construction
Zhengxun Jin ... Jonghyeob Kim
Sustainability | VOL. 11
Zhengxun Jin, et. al.Zhengxun Jin ... Jonghyeob Kim
12 Jul 2019
Sustainability | VOL. 11

Schedule risk management at early stages of large construction projects based on the GERT model
Xianyi Gao ... Jing Lin
-
Xianyi Gao, et. al. Xianyi Gao ... Jing Lin
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A theoretically proposed algorithm in a decision tree format for choosing an efficient storage type of large datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Technology audit and production reserves