Hadoop Cluster Research Articles

The creation of monitoring clusters based on cloud computing technologies is a promising direction for the development of systems for continuous monitoring of objects for various purposes in the web space. Hadoop web-programming environment is the technological basis for the development of algorithmic and software solutions for the synthesis of monitoring clusters, including information security and information counteraction systems. The International Telecommunication Union’ (ITU) recommendations Y. 3510 present the requirements for cloud infrastructure that require monitoring the performance of deployed applications based on the collection of real-world statistics. Often, computing resources of monitoring clusters of cloud data centers are allocated for continuous parallel processing of high-speed streaming data, which imposes new requirements to monitoring technologies, necessitating the creation and research of new models of parallel computing. The need to use service monitoring plays an important role in the cloud computing industry, especially for SLA/QoS assessment, as the application or service may experience problems even if the virtual machines on which the work is taking place appear to be operational. This requires to study the methodological possibilities of organization to study of parallel processing high-speed streaming services with the processing of huge amounts of bit data, and, simultaneously, to estimate the necessary computational resource. In the conditions of high dynamics of changes in the bit rate of information generation from the source, a model of the bit rate of Discretized Stream (DStream) formation is proposed, which has a common application. Based on the poly-burst nature of the bit rate model, a model of group content traffic of any sources of different services processed in the cloud cluster was created. The obtained results made it possible to develop mathematical models of parallel DStreams from sources processed in a cloud cluster via Hadoop technology using the micro-batch architecture of the Spark Streaming module. These models take into account the flow of requests for maintenance from sources of different services, on the one hand, and, on the other hand, the needs of services in bit rate, taking into account the multichannel traffic of sources of various services. At the same time, analytical relations are obtained to calculate the required performance of the Hadoop cluster at a given value of the probability of batch loss.

In the past ten years, rapid progress has been observed in science and technology through the development of smart mobile devices, workstations, supercomputers, smart gadgets and network servers. Increase in the number of Internet users and a multiple increase in the speed of the Internet led to the generation of a huge amount of data, which is now commonly called «big data». Given this scenario, storing and processing data on local servers or personal computers can cause a number of problems that can be solved using distributed computing, distributed data storage and distributed data transfer. There are currently several cloud service providers to solve these problems, like Amazon Web Services, Microsoft Azure, Cloudera and etc. Approaches for distributed computing are supported using powerful data processing centers (DPCs). However, traditional DPCs require expensive equipment, a large amount of energy to run and operate the system, a powerful cooling system and occupy a large area. In addition, to maintain such a system, its constant use is necessary, because its stand-by is economically disadvantageous. The article is aimed at the possibility of using a Raspberry Pi and Hadoop cluster for distributed storage and processing of «big data». Such a trip provides low power consumption, the use of limited physical space, high-speed solution to the problems of processing data. Hadoop provides the necessary modules for distributed processing of big data by deploying Map-Reduce software approaches. Data is stored using the Hadoop Distributed File System (HDFS), which provides more flexibility and greater scalability than a single computer. The proposed hardware and software data processing system based on Raspberry Pi 3 microcomputer can be used for research and scientific purposes at universities and scientific centers. Considered distributed system shows economically efficiency in comparison to traditional DPCs. The results of pilot project of Raspberry Pi cluster application are presented. A distinctive feature of this work is the use of distributed computing systems on single-board microcomputers for academic purposes for research and educational tasks of students with minimal cost and ease of creating and using the system.

Hadoop Cluster Research Articles

Related Topics

Articles published on Hadoop Cluster

Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments

A Paucity of Code Status Documentation Despite Increasing Complex Chronic Disease in Pediatrics

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Scalable Mining of Contextual Outliers Using Relevant Subspace

Job failure prediction in Hadoop based on log file analysis

From Big Data to business analytics: The case study of churn prediction

Research on campus network cloud storage open platform based on cloud computing and big data technology

Massive Power Information Processing Scheme Based on MongoDB

K‐PSO: An improved PSO‐based container scheduling algorithm for big data applications

Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Processing streams in a monitoring cloud cluster

A two steps method of resources utilization predication for large Hadoop data center

Secure and privacy-preserving multi-keyword ranked information retrieval from encrypted big data

Teaching Hadoop Using Role Play Games

Secure and privacy-preserving multi-keyword ranked information retrieval from encrypted big data

Fault prediction for distributed computing Hadoop clusters using real-time higher order differential inputs to SVM: Zedacross

Fault prediction for distributed computing Hadoop clusters using real-time higher order differential inputs to SVM: Zedacross

Performance tuning analysis of spatial operations on Spatial Hadoop cluster with SSD

Hardware and software data processing system for research and scientific purposes based on Raspberry Pi 3 microcomputer

Resilient intrusion detection system for cloud containers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hadoop Cluster Research Articles

Related Topics

Articles published on Hadoop Cluster

Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments

A Paucity of Code Status Documentation Despite Increasing Complex Chronic Disease in Pediatrics

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Scalable Mining of Contextual Outliers Using Relevant Subspace

Job failure prediction in Hadoop based on log file analysis

From Big Data to business analytics: The case study of churn prediction

Research on campus network cloud storage open platform based on cloud computing and big data technology

Massive Power Information Processing Scheme Based on MongoDB

K‐PSO: An improved PSO‐based container scheduling algorithm for big data applications

Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment

Processing streams in a monitoring cloud cluster

A two steps method of resources utilization predication for large Hadoop data center

Secure and privacy-preserving multi-keyword ranked information retrieval from encrypted big data

Teaching Hadoop Using Role Play Games

Secure and privacy-preserving multi-keyword ranked information retrieval from encrypted big data

Fault prediction for distributed computing Hadoop clusters using real-time higher order differential inputs to SVM: Zedacross

Fault prediction for distributed computing Hadoop clusters using real-time higher order differential inputs to SVM: Zedacross

Performance tuning analysis of spatial operations on Spatial Hadoop cluster with SSD

Hardware and software data processing system for research and scientific purposes based on Raspberry Pi 3 microcomputer

Resilient intrusion detection system for cloud containers