Abstract

The object of research is methods and approaches to improve storage efficiency and optimize access to large amounts of data. The importance of this study consists in the wide dissemination of big data and the need for the right selection of technologies that will help improve the efficiency of big data processing systems. The complexity of the choice is caused by the large number of different data storages and databases that are available now, so the best decision requires a deep understanding of the advantages, disadvantages and features of each. And the difficulty lies in the lack of a universal algorithm for deciding on the optimal repository. Accordingly, based on the experiments, analysis of existing projects and research papers, a decision-making algorithm was proposed that determines the best way to store large datasets, depending on their characteristics and additional system requirements. This is necessary to simplify the design of the system in the early stages of big data processing projects. Thus, by highlighting the key differences, as well as the disadvantages and advantages of each type of storage and database, a list of key characteristics of the data and the future system, which should be considered when designing. This algorithm is a theoretical proposal based on the studied research papers. Accordingly, using this algorithm at the design stage of the system, it would be possible to quickly and clearly determine the optimal type of storage of large datasets. The paper considers column-oriented, document-oriented, graph and key-value types of databases, as well as distributed file systems and cloud services.

Highlights

  • There is a significant increase in the amount of data produced by large companies, enterprises, governments, and ordinary people

  • One of the main problems with application of big data technologies is data storage [1], as in addition to storing data in a certain form, it is necessary to provide it with appropriate access to meet the needs of the stages of use of this data

  • After pre-processing and cleaning raw data intended for machine learning, artificial intelligence processing or analytics, it is necessary to place it in a specific data storage or database that will be most effective in a particular situation [2]

Read more

Summary

Introduction

There is a significant increase in the amount of data produced by large companies, enterprises, governments, and ordinary people This data has different sources of origin and is related to different fields: healthcare, economics, marketing, business, and others. Such huge amounts of data are the result of expanding the use of social networks, electronic devices, and other information technologies. One of the main problems with application of big data technologies is data storage [1], as in addition to storing data in a certain form, it is necessary to provide it with appropriate access to meet the needs of the stages of use of this data. The aim of this research is a comparative analysis of big data storage methods and creation of a decision support system to determine the most appropriate method

Research methodology
Research results and discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call