Abstract

In recent years, a number of platforms for building Big Data applications, both open-source and proprietary, have been proposed. One of the most popular platforms is Apache Hadoop, an open-source software framework for Big Data processing used by leading companies like Yahoo and Facebook. Historically, earlier versions of Hadoop did not prioritize security, so Hadoop has continued to make security modifications. In particular, the Hadoop Distributed File System (HDFS) upon which Hadoop modules are built did not provide robust security for user authentication. This paper proposes a token-based authentication scheme that protects sensitive data stored in HDFS against replay and impersonation attacks. The proposed scheme allows HDFS clients to be authenticated by the datanode via the block access token. Unlike most HDFS authentication protocols adopting public key exchange approaches, the proposed scheme uses the hash chain of keys. The proposed scheme has the performance (communication power, computing power and area efficiency) as good as that of existing HDFS systems.

Highlights

  • With the growth of social networks and smart devices, the use of Big Data has increased dramatically over the past few years

  • Open-source platforms for scalable and distributed processing of data are being actively studied in Cloud Computing in which dynamically scalable and often virtualized IT resources are provided as a service over the Internet [4]

  • This paper proposes a token-based authentication scheme that protects sensitive Hadoop Distributed File System (HDFS) data against replay and impersonation attacks

Read more

Summary

Introduction

With the growth of social networks and smart devices, the use of Big Data has increased dramatically over the past few years. Delegation token approaches use symmetric encryption and the shared keys may be distributed to hundreds or even thousands of hosts depending upon the token type [15,16]. This leaves Hadoop communication vulnerable to eavesdropping and modification, making replay and impersonation attacks more likely. Hadoop security controls require the namenode and the datanode to share a private key to use the block access token. This paper proposes a token-based authentication scheme that protects sensitive HDFS data against replay and impersonation attacks. The proposed scheme allows clients to be authenticated to the datanode via the block access token.

Related work
MapReduce
Overview
Token generation
Authentication in client and datanode communication
Security evaluation
Performance evaluation
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call