The creation of monitoring clusters based on cloud computing technologies is a promising direction for the development of systems for continuous monitoring of objects for various purposes in the web space. Hadoop web-programming environment is the technological basis for the development of algorithmic and software solutions for the synthesis of monitoring clusters, including information security and information counteraction systems. The International Telecommunication Union’ (ITU) recommendations Y. 3510 present the requirements for cloud infrastructure that require monitoring the performance of deployed applications based on the collection of real-world statistics. Often, computing resources of monitoring clusters of cloud data centers are allocated for continuous parallel processing of high-speed streaming data, which imposes new requirements to monitoring technologies, necessitating the creation and research of new models of parallel computing. The need to use service monitoring plays an important role in the cloud computing industry, especially for SLA/QoS assessment, as the application or service may experience problems even if the virtual machines on which the work is taking place appear to be operational. This requires to study the methodological possibilities of organization to study of parallel processing high-speed streaming services with the processing of huge amounts of bit data, and, simultaneously, to estimate the necessary computational resource. In the conditions of high dynamics of changes in the bit rate of information generation from the source, a model of the bit rate of Discretized Stream (DStream) formation is proposed, which has a common application. Based on the poly-burst nature of the bit rate model, a model of group content traffic of any sources of different services processed in the cloud cluster was created. The obtained results made it possible to develop mathematical models of parallel DStreams from sources processed in a cloud cluster via Hadoop technology using the micro-batch architecture of the Spark Streaming module. These models take into account the flow of requests for maintenance from sources of different services, on the one hand, and, on the other hand, the needs of services in bit rate, taking into account the multichannel traffic of sources of various services. At the same time, analytical relations are obtained to calculate the required performance of the Hadoop cluster at a given value of the probability of batch loss.
Read full abstract