Distributed storage optimization using multi-agent systems in Hadoop

Manar Sais,Rabie Mahdaoui,Najat Rafalia,Jaafar Abouchabaka

doi:10.1051/e3sconf/202341201091

Abstract

Understanding data and extracting information from it are the main objectives of data science, especially when it comes to big data. To achieve these goals, it is necessary to collect and process massive data sets, arriving at the system in different formats at great velocity. The Big Data era has brought us new challenges in data storage and management, and existing state-ofthe-art data storage and processing tools are poised to meet the challenges while posing challenges to the next generation of data. Big Data storage optimization is essential for improving the overall efficiency of Big Data systems by maximizing the use of storage resources. It also reduces the energy consumption of Big Data systems, resulting in financial savings, environmental protection, and improved system performance. Hadoop provides a solution for storing and analysing large quantities of data. However, Hadoop can encounter storage management problems due to its distributed nature and the management of large volumes of data. In order to meet future challenges, the system needs to intelligently manage its storage system. The use of a multi-agent system presents a promising approach for efficiently managing hot and cold data in HDFS. These systems offer a flexible, distributed solution for solving complex problems. This work proposes an approach based on a multi-agent system capable of gathering information on data access activity in the HDFS cluster. Using this information, it classifies data according to its temperature (hot or cold) and makes decisions about data replication based on its classification. In addition, it compresses unused data to manage resources efficiently and reduce storage space usage.

Full Text