Abstract

Modern latency-critical online services such as search engines often process requests by consulting large input data spanning massive parallel components. Hence the tail latency of these components determines the service latency. To trade off result accuracy for tail latency reduction, existing techniques use the components responding before a specified deadline to produce approximate results. However, they skip a large proportion of components when load gets heavier, thus incurring large accuracy losses. In this paper, we propose CLAP to enable component-level approximate processing of requests for low tail latency and small accuracy losses. CLAP aggregates information of input data to create small aggregated data points. Using these points, CLAP reduces latency variance of parallel components and allows them to produce initial results quickly; CLAP also identifies the parts of input data most related to requests’ result accuracies, thus first using these parts to improve the produced results to minimize accuracy losses. We evaluated CLAP using real services and datasets. The results show: (i) CLAP reduces tail latency by 6.46 times with accuracy losses of 2.2 percent compared to existing exact processing techniques; (ii) when using the same latency, CLAP reduces accuracy losses by 31.58 times compared to existing approximate processing techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call