Fine-tuning Resource Allocation of Apache Spark Distributed Multinode Cluster for Faster Processing of Network-trace Data

L B Shyamasundar ,Varun Anilkumar ,J Keziya Rani

doi:10.14569/ijacsa.2019.0101184

L B Shyamasundar , Varun Anilkumar + Show 1 more

Open Access

https://doi.org/10.14569/ijacsa.2019.0101184

Copy DOI

Abstract

In the field of network security, the task of process-ing and analyzing huge amount of Packet CAPture (PCAP) data is of utmost importance for developing and monitoring the behavior of networks, having an intrusion detection and prevention system, firewall etc. In recent times, Apache Spark in combination with Hadoop Yet-Another-Resource-Negotiator (YARN) is evolving as a generic Big Data processing platform. While processing raw network packets, timely inference of network security is a primitive requirement. However, to the best of our knowledge, no prior work has focused on systematic study on fine-tuning the resources, scalability and performance of distributed Apache Spark cluster (while processing PCAP data). For obtaining best performance, various cluster parameters like number of cluster nodes, number of cores utilized from each node, total number of executors run in the cluster, amount of main-memory used from each node, executor memory overhead allotted for each node to handle garbage collection issue, etc., have been fine-tuned, which is the focus of the proposed work. Through the proposed strategy, we could analyze 85GB of data (provided by CSIR Fourth Paradigm Institute) in just 78 seconds, using 32 node (256 cores) Spark cluster. This would otherwise take around 30 minutes in traditional processing systems.

Highlights

Big data could be defined as data with high variety, volume, velocity and veracity information assets [1]
Various cluster parameters like number of nodes in the cluster, number of cores utilized from each node, total number of executors run in the cluster, amount of Random Access Memory (RAM) used from each node, YARN executor memory overhead allotted for each node to handle garbage collection issue, etc. have been fine-tuned, which is the focus of the proposed work
4 months of network trace data which contains 85GB of data has been analyzed. This data has www.ijacsa.thesai.org been processed in stages of 1 month, 2 months and 4 months. 32 nodes have been used from the test bed, each having 8 cores of CPU with 32GB of RAM

Summary

Anilkumar2 CSIR-Fourth Paradigm Institute

Abstract—In the field of network security, the task of processing and analyzing huge amount of Packet CAPture (PCAP) data is of utmost importance for developing and monitoring the behavior of networks, having an intrusion detection and prevention system, firewall etc. To the best of our knowledge, no prior work has focused on systematic study on fine-tuning the resources, scalability and performance of distributed Apache Spark cluster (while processing PCAP data). We could analyze 85GB of data (provided by CSIR Fourth Paradigm Institute) in just 78 seconds, using 32 node (256 cores) Spark cluster. This would otherwise take around 30 minutes in traditional processing systems

INTRODUCTION

Apache Hadoop versus Apache Spark

Motivation for the Work

AND RELATED WORK

Cluster Setup and Spark Application Submission

Resource Allocation Schemes

Utilized Testbed Description

Model to Estimate Execution Time

Model to Estimate Memory Consumption

Model to Predict the Performance

RESULTS AND DISCUSSION

CONCLUSIONS AND FUTURE WORK

Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2019
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Fine-tuning Resource Allocation of Apache Spark Distributed Multinode Cluster for Faster Processing of Network-trace Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Chapter 26 - Intrusion Prevention and Detection Systems
Christopher Day
Computer and Information Security Handbook | VOL. -
Christopher DayChristopher Day
01 Jan 2013
Computer and Information Security Handbook | VOL. -

Chapter 5 - Intrusion Prevention and Detection Systems
Christopher Day
Managing Information Security | VOL. -
Christopher DayChristopher Day
01 Jan 2013
Managing Information Security | VOL. -

Automated Intrusion Detection and Prevention System over SPIT (AIDPoS)
Amna Saad ... Izzat Norkhalim
-
Amna Saad, et. al.Amna Saad ... Izzat Norkhalim
01 Aug 2015
01 Aug 2015

Intrusion Detection and Prevention Systems
Peter Mell ... Karen Scarfone
-
Peter Mell, et. al.Peter Mell ... Karen Scarfone
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fine-tuning Resource Allocation of Apache Spark Distributed Multinode Cluster for Faster Processing of Network-trace Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications