Abstract

Traditionally, performance analysis, de-bugging, triaging, troubleshooting, and optimization are left in the hands of performance experts. The main rationale behind this is that performance engi-neering is considered a specialized do-main expertise, and therefore left to the trained hands of experts. However, this approach requires human manpower to be put behind every performance escala-tion. This is no longer future proof in enterprise environments because of the following reasons: (i) Enterprise customers now expect much quicker performance troubleshooting, particularly in cloud platforms as Soft-ware As A Service (SaaS) offerings where the billing is subscription based, (ii) As products grow more distributed and complex, the number of performance met-rics required to troubleshoot a perfor-mance problem implodes, making it very time consuming for human intervention and analysis, and (iii) Our past experi-ences show that while many customers land up on similar performance issues, the human effort to troubleshoot each of these performance issues in a different infrastructural environment is non-trivial. We believe that data analytics platforms that can quickly mine through performance data and point out potential bottlenecks offer a good solution for non-domain experts to debug and solve a performance issue. In this work, we showcase a cloud based performance data analytics framework which can be leveraged to build tools which analyze and root-cause performance issues in enterprise sys-tems. We describe the architecture of this framework which consists of: (i) A cloud service (which we term as a plugin), (ii) Supporting libraries that may be used to interact with this plugin from end-systems such as computer serv-ers or appliance Virtual Machines (VMs), and (iii) A solution to monitor and ana-lyze the results delivered by the plugin. We demonstrate how this platform can be used to develop different perfor-mance analyses and debugging tools. We provide one example of a tool that we have built on top of this framework and released: VMware Virtual SAN (vSAN) per-formance diagnostics.We specifically discuss how collecting performance data in the cloud from over a thousand deployments, and then analyz-ing to detect performance issues, helped us write rules that can easily detect similar performance issues. Finally, we discuss a framework for monitoring the performance of the rules and improving them.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call