Abstract

With the prevalence of big-data-driven applications, such as face recognition on smartphones and tailored recommendations from Google Ads, we are on the road to a lifestyle with significantly more intelligence than ever before. Various neural network powered models are running at the back end of their intelligence to enable quick responses to users. Supporting those models requires lots of cloud-based computational resources, e.g., CPUs and GPUs. The cloud providers charge their clients by the amount of resources that they occupy. Clients have to balance the budget and quality of experiences (e.g., response time). The budget leans on individual business owners, and the required Quality of Experience (QoE) depends on usage scenarios of different applications. For instance, an autonomous vehicle requires an real-time response, but unlocking your smartphone can tolerate delays. However, cloud providers fail to offer a QoE-based option to their clients. In this paper, we propose DQoES, differentiated quality of experience scheduler for deep learning inferences. DQoES accepts clients' specifications on targeted QoEs, and dynamically adjusts resources to approach their targets. Through the extensive cloud-based experiments, DQoES demonstrates that it can schedule multiple concurrent jobs with respect to various QoEs and achieve up to 8x times more satisfied models when compared to the existing system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.