Characterizing Web Page Complexity and Its Impact

Michael Butkiewicz,Vyas Sekar,Harsha V Madhyastha

doi:10.1109/tnet.2013.2269999

Abstract

Over the years, the Web has evolved from simple text content from one server to a complex ecosystem with different types of content from servers spread across several administrative domains. There is anecdotal evidence of users being frustrated with high page load times. Because page load times are known to directly impact user satisfaction, providers would like to understand if and how the complexity of their Web sites affects the user experience. While there is an extensive literature on measuring Web graphs, Web site popularity, and the nature of Web traffic, there has been little work in understanding how complex individual Web sites are, and how this complexity impacts the clients' experience. This paper is a first step to address this gap. To this end, we identify a set of metrics to characterize the complexity of Web sites both at a content level (e.g., number and size of images) and service level (e.g., number of servers/origins). We find that the distributions of these metrics are largely independent of a Web site's popularity rank. However, some categories (e.g., News) are more complex than others. More than 60% of Web sites have content from at least five non-origin sources, and these contribute more than 35% of the bytes downloaded. In addition, we analyze which metrics are most critical for predicting page render and load times and find that the number of objects requested is the most important factor. With respect to variability in load times, however, we find that the number of servers is the best indicator.

Full Text