Victor – Proactive Fault Tracking and Resolution in Broadband Networks Using Collaborative Intelligent Agents

J. Odubiyi,G. Bayless,E. Ruberton

doi:10.1002/0470841818.ch20

J. Odubiyi, G. Bayless + Show 1 more

https://doi.org/10.1002/0470841818.ch20

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper presents Victor, a multi-agent system (MAS) prototype aimed at assessing and predicting faults in network elements that affect global Virtual Connections (VCs) with ATM, frame relay and IP service configurations. The fault assessment component of Victor aims to eliminate the VC tracking functions currently performed with labor-intensive methods, and advise network operations personnel on specific actions to fix faults affecting specific VCs. The predictive fault management component has the ability to predict potential degradation of network elements (e.g., switches, cards and trunks). This would enable timely response to network faults resulting in a significant reduction in level of human expertise and effort, and reduce costly errors. Victor relies on distributed and collaborative intelligent processing agents to associate, correlate and combine knowledge and information from multiple operations databases. Since the network under study is driven by a service level agreement of 99.9% availability, Victor multi-agent control strategies are implemented to avoid undesirable effects on the network service goals. Introduction Tracking end-to-end Virtual Connections (VCs) to locate the sources of VC and trunk performance degradations on Concert's global broadband (ATM, frame and IP) networks is currently performed using some labor-intensive methods. To determine the health status of each VC, network support engineers typically execute several commands to identify the path from entry node to the exit node for each subnet. This demands the coordination of several people who must independently review separate databases to check information on network elements and communicate IP addresses of network elements over the telephone. The process is error prone and time consuming. Network fault management involves the physical monitoring of network elements, fault assessment and fault mitigation. Traditional vendor-supplied network-element management systems present a never-ending stream of alarm information to the operator consoles in real-time. Depending on the state of the network, the volume of alarms can range from tens to hundreds of messages per minute. Once an alarm is reported or a trouble ticket is opened, network operators at the Network Management Centers perform problem diagnosis to ascertain the problem and its exact location. During this manually intensive and time-consuming process, network performance often continues to deteriorate. Network fault assessment in a distributed environment is essential for prompt diagnosis and problem resolution to provide quality service to customers. Victor will provide an intelligent approach to automatic fault assessment once a trouble ticket is generated. The Victor prototype will automatically check for trouble tickets, perform fault assessment and report the fault to the user. In addition, it supports proactive fault management by regularly polling for specific performance criteria such as the networks' trunks and port statistics. It also calculates performance trends and alerts the operator of potential faults in the network. A rapid implementation of mitigation strategies may prevent a network anomaly from escalating into an unacceptable level of performance, which is a possible occurrence with existing tool set. The MAS employs a multi-layered system architecture based on the roles performed by each agent. Several researchers (Hayzelden & Bigham, 1998), applied the multi-layered approach for ATM virtual path resource management, (Odubiyi, Meekins, et al, 1999) applied it in Proteus for ATM network performance management, and in SAIRE, an agent-based search engine (Odubiyi, Kocur, et al, 1997). The role modeling strategy has also been implemented successfully (Victor Lesser, 1999) in generic partial goal planning (GPGP) and (Wooldridge, Jennings & Kinney, 1999) for business process management. The Zeus collaborative agent building tool kit (Ndumu, Hyacinth, Collis & Lee, 1999) supports the role modeling process. Therefore, the challenge that we face in this project is not how to build operational MAS, but building open MAS (Luc Steels, 1998) where the agents can adapt to changing operational environments. Since the agents are autonomous (Heckman & Wobbrock, 1998, Douglas Dyer, 1999), control strategies 1 This research was funded by BT Corporate Research and Technology Programme Office at Adalstral Park, Martlesham Heath, Ipswich, UK. Article was published in Agent Technology for Communication Infrastructures. 2 See Acknowledgements.

Full Text