Abstract

In this paper we model workloads for a web search system from the performance point of view. We analyze both workload intensity and service demand parameters expressed in the context of web search systems as the distribution of the interarrival times of queries and the per-query execution time, respectively. Our results are derived from experiments in an information retrieval testbed fed with real-world experimental data. Our findings unveil a certain number of unexpected and interesting features. We verify in practice that there is a high variability in both interarrival times of queries reaching a search engine and service times of queries processed in parallel by a cluster of index servers. We also show that this highly variable behavior can be accurately captured by hyperexponential distributions. These results shed light on the usual assumption taken by previous analytical models for web search systems found in the literature that interarrival times and service times are exponentially distributed. We find evidence that the intensity and service demand workloads of a typical web search system present long-range dependence characteristics, leading to self-similar behavior. This finding is important because, in the presence of long-range dependence and self-similarity, exponential-based models tend to underestimate response times as self-similarity leads to increased queueing delays, resulting in significant performance degradation. Based on our findings, we also discuss possible steps toward a generative model for synthetic workloads.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.