Abstract

The properties of a web page have a strong impact on its overall loading process, including the download of its contents and their progressive rendering at the browser. As a consequence, web page content has a strong impact on the experience of web users. In this paper, we present WebCLUST, a clustering-based classification approach for web pages, which groups pages into quality-meaningful content classes impacting the Quality of Experience (QoE) of the users. Groups are defined based on standard Multipurpose Internet Mail Extensions (MIME) content breakdown and external subdomain connections, obtained through in-browser, application level measurements. Using a large corpus of multi-device, heterogeneous web content and QoE-relevant measurements for the top-500 most popular websites in the Internet, we show how WebCLUST can automatically identify relevant web content classes showing significantly different performance in terms of Web QoE relevant metrics, such as Speed Index. We additionally evaluate the impact of content caching and device type on the identification performance of WebCLUST, showing how content classes might look significantly different, depending on the access device type (desktop vs mobile), as well as when considering browser caching. Our findings suggest that Web QoE assessment should explicitly consider page content and subdomain embedding within the analysis, especially when it comes to recent work on Web QoE inference through machine learning models. To the best of our knowledge, this is the first study showing the impact of web content on Web QoE metrics, opening the door to new Web QoE assessment strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.