Distributed decentralized collaborative monitoring architecture for cloud infrastructures

Xiaolong Xu,Yun Chen,Jose M Alcaraz Calero

doi:10.1007/s10586-016-0675-5

Abstract

Cloud computing infrastructures are demanding an efficient monitoring mechanism to warranty the operational state of large-scale virtualized data centers and to provide mechanisms to improve the efficiency and stability of such infrastructures. Traditionally, centralized monitoring models (CMM) provide high performance and availability for the group of nodes in charge of monitoring tasks. However, the centralized nature of this architecture, easily leads to a single point of failure, bottlenecks in terms of performance and an unbalanced distributions of the monitoring workloads. These facts are not being suitable for large-scale cloud infrastructures. To tackle this concern, the main contribution of this paper is a distributed collaborative monitoring model (DCMM) for cloud computing infrastructures. DCMM provides self-organized capabilities based on mutual perception and balanced monitoring of each node. DCMM also provides rapid notification and recovery mechanisms under degraded conditions. In addition, an adaptive threshold control algorithm (ATCA) is proposed to dynamically adapt the sets of thresholds used for notification purposes in order to identify unnecessary duplicate information sent back to the monitoring tool. ATCA is based on historical monitoring records. Both DCMM and ATCA are described in detail in this contribution. Several empirical experiments have been done using OpenStack cloud infrastructure in order to validate our claims. Experimental results show that DCMM with ATCA can efficiently balance monitoring workload, reduce the workload of monitoring nodes, avoid a single point of failure, and reduce bottleneck problems whereas it is contributing to the achievement of real-time monitoring and data consistency within the monitoring architecture.

Full Text