Abstract

ABSTRACT This paper presents a model to observation the Cloud computing for any anomalous activity. Hadoop it is a largely used open source Cloud Computing framework to huge data. It uses the model Machine Learning technique to detect classify anomalies of sensory observation and help to in ensuring the stabilization of virtual sensor networks. The framework it’s built on top of the Hadoop and MapReduce implementation which is use one of the Machines Learning techniques to detect these anomalies. Preliminary results show that our classification mechanism is promising and able to detect anomalous events that may cause a threat to the Cloud Computing. General Terms MapReduce, Hadoop, Anomaly Detection, Machine Learning, Cloud Computing, Weka. Keywords MapReduce, Hadoop, anomaly detection, Machine Learning, Cloud Computing. 1. INTRODUCTION Cloud Computing is rapidly getting more and more common in distributed computing environment. Cloud environments are used for storage and processing of data. Cloud Computing supplies infrastructure, applications and programs by internet. Cloud computing is a model to enable an easy access on network demand to the shared pool of computing resources form like network ,services storage ,services and applications That can be quickly supplied and released with little management effort or services supplier interaction. The next models are showed by considering the deployment scenario as, Private Cloud, Public Cloud, Community Cloud and hybrid Cloud [1] [2]. MapReduce is a Cloud framework for processing match problems by mass dataset developed by Google as a popular open source execution of MapReduce, Hadoop has been largely used in big companies like, Yahoo and Ebay for data drastic careers [3]. However, successful execution of such jobs is not easy. On one hand, devices used in cloud are usually low cost ones which would make higher error probability in hardware, and on the other hand, some problems such as program bugs will also cause system performance degradation. MapReduce usually dividing the input data set into independent divisions that rely on the size of the dataset of the number of nodes used and have two main jobs Map and Reduce, The Map takes a series of (Key and Value) pairs, processes every one of them, and generates zero or more output (Key and Value) pairs. The input and output kind of the map can be often are different from each one, and then the Reduce function aggregates and combines all intermediate values list output coming from the Map function which have the same intermediate key [4].The Hadoop framework has distributed files system called (HDFS) Hadoop Distributed File System used to support the processing and management of big scale data sets. Furthermore, the MapReduce in Hadoop is designed to work efficiently with HDFS by moving the computation process for data and not the other way around to allow Hadoop to achieve high data locality [5]. Problems and errors are always reflected as system anomalies in which anomalies may cause longer job times and deterioration of data transfer speed. Moreover, if they're critical, the task might get interrupted. Therefore, it’s essential to find out anomalies in time for reducing and avoiding losses. Certain characteristics of MapReduce make MapReduce differs and that makes various tasks with the same configuration environment that causes inconsistent execution times , although the same task executed at various times , Run time may differs too as a result of volatilization and doubt of the system . As a result, some of the ordinary ways depended on response time are not operative to find out anomalies in MapReduce area [6]. Those methods use common immovable time out threshold, where tasks are going to be known as anomalies if their implementation times go beyond the threshold. Besides MapReduce although has specifications of multi nodes and divided. i.e. like Ebay which owns 532 nodes clusters ( 8*532 cores , 5.3PB ) in total for MapReduce [7]Tasks in Map Reduce are going to be implemented on a lot of nodes which are connected to each others . Thus, the methods intend to find out anomaly in one node case [8, 9] are not matched for Map Reduce environment. The rest of this paper is organized as the next. In part II. We suggest related work focused on some previous researches using Hadoop and MapReduce in detection task either anomaly or non anomaly and presents their results. Experimental discussion and evaluation are described in part III. Part IV shows result and future work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call