Workload Consolidation Research Articles

Application of artificial intelligence and machine learning is transforming Industrial Internet of Things (IIoT) segments by enabling higher productivity, better insights, less downtime, and superior product quality. Through AI-inspired innovations, businesses are gaining a sizable competitive edge and product leadership. To realize the promise and potential of AI, software needs to enable fast prototyping and experimentation at scale, and efficiently turn data into valuable insights. Over 100× performance speedup is possible with software that is well parallelized, vectorized, written with better data reuse, has cache blocking, makes good use of prefetching, and above all, is abstracted with familiar industry-standard APIs and libraries used by ML developers and data scientists. As a result of increased software performance efficiency, on the same hardware system, a customer can realize and serve higher insights per second and increase inference throughput. Our software optimizations target a general-purpose CPU, which, in addition to AI models and pipelines, can also be efficient through workload consolidation for other IIoT applications. Both general-purpose workloads and AI model inferencing run on the same hardware targets while minimizing energy, maintenance, and total cost of ownership of a heterogenous infrastructure. Edge IIoT also has unique power and space constraints that can be addressed well with multi-stream inferencing support on general-purpose CPUs. It is imperative to optimize software for all phases of the end-to-end AI pipeline to run “efficient AI” and realize quicker insights, thereby turning them into concrete business results for IIoT solutions. In this article, using optimized frameworks such as Intel Pytorch extensions, Intel Scikit-learn extensions, and Intel distribution of Modin, we get 3.6x-81 × improvement in end-to-end pipeline performance. What this means for the CNN-based anomaly detection pipeline and predictive analytics pipeline solutions is higher throughput of insights and also serving multiple streams of inference across the shop floor, thereby maximizing the potential of the IIoT AI solution. Popular and relevant IIoT AI use cases can realize significant performance improvement when software and AI implementations are purposefully optimized for the target AI hardware and systems.

Read full abstract

Reducing tail latency becomes increasingly important to improve the user-perceived service experience. User-facing latency-sensitive cloud applications typically contain multiple interactive tiers (e.g., Web, App, Database) running in different virtual machines (VMs) with complex interaction patterns. However, such interactions between VMs in different tiers are often neglected in previous VM consolidation methods, resulting in poor application performance. In this article, we study the consolidation of multi-tier interactive workloads from a new perspective of user-perceived tail latency. We propose a novel profiling-based consolidation methodology to satisfy tail latency requirements while reducing the number of used physical machines. To achieve such a goal, we first perform large-scale profiling experiments under various consolidation settings in a KVM virtualized private cluster to establish the empirical performance values. We consider two key factors that affect the tail latency of multi-tier workloads: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">interference with co-located VMs and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">interaction between tiers. We model the consolidation of multi-tier workloads as an optimization problem with different objectives and constraints, and derive the consolidation schedule. We implement and evaluate the proposed models, as well as comparing with other methods (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">without profiling or <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">without considering interaction influence). Extensive experimental results show that the proposed method is able to reduce up to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5X tail latency, compared with the method <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">without profiling and up to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1.3X tail latency, compared with the method <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">without considering the interaction influence between different tiers.

Read full abstract

Workload Consolidation Research Articles

Related Topics

Articles published on Workload Consolidation

Efficient resource allocation in cloud computing environments using AI-driven predictive analytics

Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job Colocation

Multi-resource predictive workload consolidation approach in virtualized environments

A novel hybrid meta‐heuristic‐oriented latency sensitive cloud object storage system

An Efficient Framework for Utilizing Underloaded Servers in Compute Cloud

Cloud Host Selection using Iterative Particle-Swarm Optimization for Dynamic Container Consolidation

End-to-End Industrial IoT: Software Optimization and Acceleration

Holistic Resource Allocation Under Federated Scheduling for Parallel Real-time Tasks

Towards Deadline Guaranteed Cloud Storage Services

A power and thermal-aware virtual machine management framework based on machine learning

Comparison of workload consolidation algorithms for cloud data centers

SDRP: Safe, Efficient, and SLO-Aware Workload Consolidation Through Secure and Dynamic Resource Partitioning

Experimental and Computational Investigations of the Thermal Environment in a Small Operational Data Center for Potential Energy Efficiency Improvements

Coupling energy efficiency and quality for consolidation of cloud workloads

Energy-efficient strategy for virtual machine consolidation in cloud environment

Interactive Context for Mobile OS Resource Management

Multi-Tier Workload Consolidations in the Cloud: Profiling, Modeling and Optimization

Micro-Economic Benefits of Peer-Producing Containerized Network Functions

Utilization Driven Model for Server Consolidation in Cloud Data Centers

Cacol: A zero overhead and non-intrusive double caching mitigation system

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Workload Consolidation Research Articles

Related Topics

Articles published on Workload Consolidation

Efficient resource allocation in cloud computing environments using AI-driven predictive analytics

Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job Colocation

Multi-resource predictive workload consolidation approach in virtualized environments

A novel hybrid meta‐heuristic‐oriented latency sensitive cloud object storage system

An Efficient Framework for Utilizing Underloaded Servers in Compute Cloud

Cloud Host Selection using Iterative Particle-Swarm Optimization for Dynamic Container Consolidation

End-to-End Industrial IoT: Software Optimization and Acceleration

Holistic Resource Allocation Under Federated Scheduling for Parallel Real-time Tasks

Towards Deadline Guaranteed Cloud Storage Services

A power and thermal-aware virtual machine management framework based on machine learning

Comparison of workload consolidation algorithms for cloud data centers

SDRP: Safe, Efficient, and SLO-Aware Workload Consolidation Through Secure and Dynamic Resource Partitioning

Experimental and Computational Investigations of the Thermal Environment in a Small Operational Data Center for Potential Energy Efficiency Improvements

Coupling energy efficiency and quality for consolidation of cloud workloads

Energy-efficient strategy for virtual machine consolidation in cloud environment

Interactive Context for Mobile OS Resource Management

Multi-Tier Workload Consolidations in the Cloud: Profiling, Modeling and Optimization

Micro-Economic Benefits of Peer-Producing Containerized Network Functions

Utilization Driven Model for Server Consolidation in Cloud Data Centers

Cacol: A zero overhead and non-intrusive double caching mitigation system