Abstract

For public cloud providers, it is of great significance to maintain the availability of their cloud services, which requires efficient anomaly diagnosis and recovery. To achieve such properties, the first step is to localize the anomalies, i.e., determining where they happen in the network path of cloud-client services. We propose FlowPinpoint to perform anomaly localization for cloud providers. FlowPinpoint collects statistics of each network flow at the cloud network gateways (i.e., gateway flowlog), where the collected data can reflect the information from both the cloud side and the Internet side. Aggregation and association are conducted on the datacenter-scale gateway flowlogs by Alibaba's big data computing platform. In order to preclude the disturbance of anomaly-unrelated flowlogs, a two-layer filter is proposed which consists of an indicator-based filter and an isolation forest filter. Finally, the anomaly localization analyzer classifies the flowlogs and determines whether the anomaly is inside the cloud network or not according to the classification results. FlowPinpoint is implemented and tested in the production environment of Alibaba Cloud, and it correctly localizes 1 anomaly inside the cloud and 6 anomalies on the Internet over 4 months.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call