Abstract

Cloud services have an auto-scaling feature for load balancing to meet the performance requirements of an application. Existing auto-scaling techniques are based on upscaling and downscaling cloud resources to distribute the dynamically varying workloads. However, bursty workloads pose many challenges for auto-scaling and sometimes result in Service Level Agreement (SLA) violations. Furthermore, over-provisioning or under-provisioning cloud resources to address dynamically evolving workloads results in performance degradation and cost escalation. In this work, we present a workload characterization based approach for scheduling the bursty workload on a highly scalable serverless architecture in conjunction with a machine learning (ML) platform. We present the use of Amazon Web Services (AWS) ML platform SageMaker and serverless computing platform Lambda for load balancing the inference workload to avoid SLA violations. We evaluate our approach using a recommender system that is based on a deep learning model for inference.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call