Intelligent inference methods for automated network diagnostics

Chathuranga Widanapathirana

doi:10.4225/03/58b8a701f2cf3

Abstract

In today’s complex networks, timely identification and resolution of performance problems is extremely challenging. Current diagnostic practices to identify the root causes of such problems primarily rely on human intervention and investigation. Fully automated and scalable systems, which are capable of identifying complex problems are needed to provide rapid and accurate diagnosis. The study presented in this thesis creates the necessary scientific basis for the automatic diagnosis of network performance faults using novel intelligent inference techniques based on machine learning. We propose three new techniques for characterisation of network soft failures, and by using them, create the Intelligent Automated Network Diagnostic (IAND) system. First, we propose Transmission Control Protocol (TCP) trace characterisation techniques that use aggregated TCP statistics. Faulty network components embed unique artefacts in TCP packet streams by altering the normal protocol behaviour. Our technique captures such artefacts and generates a set of unique fault signatures. We first introduce Normalised Statistical Signatures (NSSs) with 460 features, a novel representation of network soft failures to provide the basis for diagnosis. Since not all 460 features contribute equally to the identification of a particular fault, we then introduce improved forms of NSSs called EigenNSS and FisherNSS with reduced complexity and greater class separability. Evaluations show that we can achieve dimensionality reduction of over 95% and detection accuracies up to 95% while achieving micro-second diagnosis times with these signatures. Second, given NSSs have features that are dependent on link properties, we introduce a technique called Link Adaptive Signature Estimation (LASE) using regression-based predictors to artificially generateNSSs for a large number of link parameter combinations. Using LASE, the system can be trained to suit the exact networking environment, however dynamic, with a minimal set of sample data. For extensive performance evaluation, we collected 1.2 million sample traces for 17 types of device failures on 8 TCP variants over various types of networks using a combination of fault injection and link emulation techniques. Third, to automate fault identification, we propose a modular inference technique that learns from the patterns embedded in the signatures, and create Fault Classifier Modules (FCMs). FCMs use support vector machines to uniquely identify individual faults and are designed using soft class boundaries to provide generalised fault detection capability. The use of a modular design and generic algorithm that can be trained and tuned based on the specific faults, offers scalability and is a key differentiator from the existing systems that use specific algorithms to detect each fault. Experimental evaluations show that FCMs can achieve detection accuracies of between 90% – 98%. The signatures and classifiers are used as the building blocks to create the IAND system with its two main sub-systems: IAND-k and IAND-h. The IANDk is a modular diagnostic system for automatic detection of previously known problems using FCMs. The IAND-k system is applied for accurately detecting faulty links and diagnosing problems in end-user devices in a wide range of network types (IAND-kUD, IAND-kCC). Extensive evaluation of the systems demonstrated high overall detection accuracies up to 96.6% with low false positives and over 90% accuracy even in the most difficult scenarios. Here, the FCMs use supervised machine learning methods and can only detect previously known problems. To extend the diagnostic capability to detect previously unknown problems, we propose IAND-h, a hybrid classifier system that uses a combination of unsupervised machine learning-based clustering and supervised machine learning-based classification. The evaluation of the system shows that previously unknown faults can be detected with over 92% accuracy. The IAND-h system also offers real-time detection capability with diagnosis times between 4 μs and 66 μs. The techniques and systems proposed during this research contribute to the state of the art of network diagnostics and focus on scalability, automation and modularity with evaluation results demonstrating a high degree of accuracy.

Full Text