Abstract

Network Service Provider (NSP), loosely defined as an organization that provides IP Network Transport as a service to either direct consumers or to other value add businesses. NSPs have struggled to reduce subscriber churn which we define as customers switching from current NSP to another competitor NSP due to dissatisfaction, for our purposes specifically dissatisfaction of network performance, such as excess latencies or downtime. The focus of this paper is reliability and maintenance, in particular network resiliency and operations. In the context of this paper, network resiliency is defined as the rate of taking corrective action due to an exogenous network disturbance or event that materially impacts the network service level as experienced by users. Operators not only want to mitigate this period of unsatisfactory network service but want to avoid it altogether, at the lowest possible operational costs by proactively monitoring user network experience, to detect anomalies and resolve by automatic root cause determination and ultimately restore satisfactory network service levels. However, in contrast, today, NSPs operate reactively, by employing teams of expensive network engineers, that manually sift through massive amounts of data to determine root causes either as a result of subscribers complaining about poor service (after customer impact) or triggered network alarms that may be a symptom of a more complex underlying root cause, or often noise, not materially impacting users. In this paper we evaluate standard machine learning approaches in extracting root causes and explain a key underlying reason for poor accuracy. The proposed contribution to improve accuracy, is a novel approach using a multi-tier ensemble machine learning approach that dynamically adapts to changing network data features sets or characteristics combinations to yield accurate causal estimations. It is due the complex interactions of different characteristics combinations that impact different algorithms to yield different accurate results. Results show that our approach improves customer experience and network operations by automatically detecting customer impacting network anomalies and identifying root causes with increased accuracy of 65.3% over any single machine learning approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call