Abstract

The MonALISA (Monitoring Agents in A Large Integrated Services Architecture) framework provides a set of distributed services for monitoring, control, management and global optimization for large scale distributed systems. It is based on an ensemble of autonomous, multi-threaded, agent-based subsystems which are registered as dynamic services. They can be automatically discovered and used by other services or clients. The distributed agents can collaborate and cooperate in performing a wide range of management, control and global optimization tasks (such as network monitoring, resource accounting) using real time monitoring information. MonALISA includes a coherent set of network management services to collect in near real-time information about the network topology, the main data flows, traffic volume and the quality of connectivity. A set of dedicated modules were developed in the MonALISA framework to periodically perform network measurements tests between all sites. We developed global services to present in near real-time the entire network topology used by a community. The time evolution of global network topology is shown in a dedicated GUI. Changes in the global topology at this level occur quite frequently and even small modifications in the connectivity map may significantly affect the network performance. The global topology graphs are correlated with active end-to-end network performance measurements, done using the Fast Data Transfer application, between all sites. Access to both real-time and historical data, as provided by MonALISA, is also important for developing services able to predict the usage pattern, to aid in efficiently allocating resources globally. For resource accounting, MonALISA collects information regarding the amounts of resources consumed by the users, which represent virtual organizations in a large scale distributed system. Besides providing statistical information, an accounting system can also be the base for managing distributed resources upon an economic model. In the MonALISA monitoring framework we developed modules that provide accounting facilities, collecting information from cluster managers like Condor, PBS, LSF and SGE. The usage statistic s is used for an intelligent management of the resources.

Highlights

  • An important part of managing global-scale systems is a monitoring system that is able to monitor and track in real time many site facilities, networks, and tasks in progress

  • MonALISA, which stands for Monitoring Agents using a Large Integrated Services Architecture, is a monitoring framework designed as an ensemble of dynamic services, able to collaborate and cooperate in performing a wide range of information gathering and processing tasks

  • We present a set of services developed in the context of the MonALISA framework for monitoring and controlling large scale networks, as an extension of the work previously presented in [2]

Read more

Summary

INTRODUCTION

An important part of managing global-scale systems is a monitoring system that is able to monitor and track in real time many site facilities, networks, and tasks in progress. The monitoring information gathered is essential for developing the required higher level services, the components that provide decision support and some degree of automated decisions and for maintaining and optimizing workflow in large scale distributed systems (LSDS). These management and global optimization functions are performed by higher level agent-based services. The monitoring framework has to intelligently collect, in a LSDS environment, a large number of monitoring events that are generated by the system components during the execution or interaction with external objects (such as users or processes) Monitoring such events is necessary for observing the run-time behavior of the large scale distributed system and for providing status information required for debugging, tuning and managing processes.

SYSTEM DESIGN
NETWORK MONITORING AND MANAGEMENT
MONITORING AND REPRESENTATION OF NETWORK TOPOLOGIES AT DIFFERENT OSI LAYERS
The Physical Network Layer Topology
A REAL USE-CASE FOR TOPOLOGY INFORMATION
Layer 3 Routed Network Topology
Automatic storage discovery for Alice
Monitoring modules for dynamic light path provisioning
MONITORING ALICE DISTRIBUTED COMPUTING ENVIRONMENT
COLLECTING ACCOUNTING INFORMATION WITH MONALISA
Collecting Information from Remote Sites
Failure Handling
Processing and Storing Accounting Information in the MonALISA Repositories
CASE STUDY
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call