User-facing Services Research Articles

User-facing services are now evolving towards the microservice architecture where a service is built by connecting multiple microservice stages. Since the entire service is heavy, the microservice architecture shows the opportunity to only offload some microservice stages to the edge devices that are close to the end users. However, emerging techniques often result in the violation of Quality-of-Service (QoS) of microservice-based services in cloud-edge continuum, as they do not consider the communication overhead or the resource contention between microservices and external co-located tasks. We propose Nautilus, a runtime system that effectively deploys microservice-based user-facing services in cloud-edge continuum. Nautilus ensures the QoS of microservice-based user-facing services while minimizing the required computational resources, which is comprised of a communication-aware microservice mapper, a contention-aware resource manager and an IO-sensitive and load-aware microservice migration scheduler. The mapper divides the microservice graph into multiple partitions based on the communication overhead and maps the partitions to appropriate nodes. On each node, the resource manager determines the optimal resource allocation for its microservices based on reinforcement learning that may capture the complex contention behaviors. Once the microservices are suffered from external IO pressure, the IO-sensitive microservice scheduler migrates the critical one to idle nodes. Furthermore, when the load of microservices changes dynamically, the load-aware microservice scheduler migrates microservices from busy nodes to idle ones to ensure the QoS goal of the entire service. Our experimental results show that Nautilus can guarantee the required QoS target under external shared resources contention while the state-of-the-art suffers from QoS violations. Meanwhile, Nautilus reduces the computational resource usage by 23.9% and the network bandwidth usage by 53.4%, while achieving the required 99%-ile latency.

Read full abstract

Modern warehouse-scale computers (WSCs) are being outfitted with accelerators to provide the significant compute required by emerging intelligent personal assistant (IPA) workloads such as voice recognition, image classification, and natural language processing. It is well known that the diurnal user access pattern of user-facing services provides a strong incentive to co-locate applications for better accelerator utilization and efficiency, and prior work has focused on enabling co-location on multicore processors. However, interference when co-locating applications on non-preemptive accelerators is fundamentally different than contention on multi-core CPUs and introduces a new set of challenges to reduce QoS violation. To address this open problem, we first identify the underlying causes for QoS violation in accelerator-outfitted servers. Our experiments show that queuing delay for the compute resources and PCI-e bandwidth contention for data transfer are the main two factors that contribute to the long tails of user-facing applications. We then present Baymax, a runtime system that orchestrates the execution of compute tasks from different applications and mitigates PCI-e bandwidth contention to deliver the required QoS for user-facing applications and increase the accelerator utilization. Using DjiNN, a deep neural network service, Sirius, an end-to-end IPA workload, and traditional applications on a Nvidia K40 GPU, our evaluation shows that Baymax improves the accelerator utilization by 91.3% while achieving the desired 99%-ile latency target for for user-facing applications. In fact, Baymax reduces the 99%-ile latency of user-facing applications by up to 195x over default execution.

Read full abstract

User-facing Services Research Articles

Related Topics

Articles published on User-facing Services

A self-stabilizing and auto-provisioning orchestration for microservices in edge-cloud continuum

AutoMan: Resource-efficient provisioning with tail latency guarantees for microservices

Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum

Data protection record of processing activities and privacy notice generator toolkit by EMBL’s European Bioinformatics Institute

Autonomous Lifecycle Management for Resource-Efficient Workload Orchestration for Green Edge Computing

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs

Every Timestamp Counts: Accurate Tracking of Network Latencies Using Reconcilable Difference Aggregator

Baymax

Baymax

Server Engineering Insights for Large-Scale Online Services

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

User-facing Services Research Articles

Related Topics

Articles published on User-facing Services

A self-stabilizing and auto-provisioning orchestration for microservices in edge-cloud continuum

AutoMan: Resource-efficient provisioning with tail latency guarantees for microservices

Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum

Data protection record of processing activities and privacy notice generator toolkit by EMBL’s European Bioinformatics Institute

Autonomous Lifecycle Management for Resource-Efficient Workload Orchestration for Green Edge Computing

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs

Every Timestamp Counts: Accurate Tracking of Network Latencies Using Reconcilable Difference Aggregator

Baymax

Baymax

Server Engineering Insights for Large-Scale Online Services