Abstract

Request latency directly affects the performance of modern cloud applications. Due to various causes in hosts and networks, can suffer from latency anomalies (), which may violate the Service-Level Agreement. However, existing performance monitoring tools have incomplete coverage and inconsistent semantics for monitoring and cannot accurately diagnose . This paper presents, a high-coverage and low-overhead event monitoring system, which monitors <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">buffers</i> to capture most -related abnormal events with consistent -level semantics in the end-to-end datapath of . First, models the datapath of as a buffer chain and defines events based on three properties of buffers, so as to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">end-to-end monitor</i> the root causes of . Then, to achieve <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">consistent semantics</i> for captured events, designs a -level semantics injection mechanism to make events captured in networks have the victim requests’ ID. Finally, offloads the semantics operations and event collection in software to SmartNICs for <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">low CPU overhead</i> . We have implemented on commodity SmartNICs and programmable switches. Evaluation results show that can diagnose 98% with <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$&lt;$</tex-math> </inline-formula> 0.08% network bandwidth overhead and 0.6% application throughput decline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call