Abstract

Hadoop technology is followed by some security issues. At its beginnings, developers paid attention to the development of basic functionalities mostly, and proposal of security components was not of prime interest. Because of that, the technology remained vulnerable to malicious activities of unauthorized users whose purpose is to endanger system functionalities or to compromise private user data. Researchers and developers are continuously trying to solve these issues by upgrading Hadoop’s security mechanisms and preventing undesirable malicious activities. In this paper, the most common HDFS security problems and a review of unauthorized access issues are presented. First, Hadoop mechanism and its main components are described as the introduction part of the leading research problem. Then, HDFS architecture is given, and all including components and functionalities are introduced. Further, all possible types of users are listed with an accent on unauthorized users, which are of great importance for the paper. One part of the research is dedicated to the consideration of Hadoop security levels, environment and user assessments. The review also includes an explanation of Log Monitoring and Audit features, and detail consideration of authorization and authentication issues. Possible consequences of unauthorized access to a system are covered, and a few recommendations for solving problems of unauthorized access are offered. Honeypot nodes, security mechanisms for collecting valuable information about malicious parties, are presented in the last part of the paper. Finally, the idea for developing a new type of Intrusion Detector, which will be based on using an artificial neural network, is presented. The detector will be an integral part of a new kind of virtual honeypot mechanism and represents the initial base for future scientific work of authors.

Highlights

  • The Big Data concept is based on storing, processing and transferring of vast amounts of unstructured, semi-structured and structured data [1]

  • Besides these three essential characteristics, Big Data can be described with variability and complexity [4]

  • Future work of the authors of this paper will be based on further research of virtual honeypots and proposing a new solution which will eventually improve overall Hadoop Distributed File System (HDFS) security and efficiently restrict unauthorized access of malicious parties

Read more

Summary

Introduction

The Big Data concept is based on storing, processing and transferring of vast amounts of unstructured, semi-structured and structured data [1]. Big Data technology is gaining global importance that will have exponential growth in the future [2] This technology provides new opportunities for all industry sectors, companies, and institutions that depend on quality processing of large amounts of raw data. It can be described by three main properties (“3V” properties): volume, velocity, and variety [3]. The variety feature can be determined by existing data types within a data set, while velocity represents the speed of storing and processing the data Besides these three essential characteristics, Big Data can be described with variability (inconsistency with periodic peaks during the flow of data) and complexity (various types of data that come from multiple sources) [4]. With the purpose to improve effectiveness and increase the robustness of existing Big Data systems, the Hadoop mechanism is proposed

The New Era of Distributed File System
HDFS Architecture
Key Security Challenges
Utilizing Remote Procedure Call Protocol
Replication Storage Model
Cluster Security Levels
User Access Monitoring
User and Environment Assessment
Authorization Labeling
Node Based Authentication Issues
Threats and Possible Attacks
Honey-Based Intrusion Detection
Deploying Honeypot Nodes
Future Work
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call