Runtime Overhead Research Articles

Serverless computing is becoming a promising paradigm for Distributed Deep Neural Network (DDNN) training in the cloud, as it allows users to decompose complex model training into a number of functions without managing virtual machines or servers. Though provided with a simpler resource interface (i.e., function number and memory size), inadequate function resource provisioning (either under-provisioning or over-provisioning) easily leads to unpredictable DDNN training performance in serverless platforms. Our empirical studies on AWS Lambda indicate that, such unpredictable performance of serverless DDNN training is mainly caused by the resource bottleneck of Parameter Servers (PS) and small local batch size. In this article, we design and implement <inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math><alternatives><mml:math><mml:mi>λ</mml:mi></mml:math><inline-graphic xlink:href="xu-ieq1-3054656.gif"/></alternatives></inline-formula>DNN, a cost-efficient function resource provisioning framework to provide predictable performance for serverless DDNN training workloads, while saving the budget of provisioned functions. Leveraging the PS network bandwidth and function CPU utilization, we build a lightweight analytical DDNN training performance model to enable our design of <inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math><alternatives><mml:math><mml:mi>λ</mml:mi></mml:math><inline-graphic xlink:href="xu-ieq2-3054656.gif"/></alternatives></inline-formula>DNN resource provisioning strategy, so as to guarantee DDNN training performance with serverless functions. Extensive prototype experiments on AWS Lambda and complementary trace-driven simulations demonstrate that, <inline-formula><tex-math notation="LaTeX">$\lambda$</tex-math><alternatives><mml:math><mml:mi>λ</mml:mi></mml:math><inline-graphic xlink:href="xu-ieq3-3054656.gif"/></alternatives></inline-formula>DNN can deliver predictable DDNN training performance and save the monetary cost of function resources by up to 66.7 percent, compared with the state-of-the-art resource provisioning strategies, yet with an acceptable runtime overhead.

Read full abstract

Atomicity is a correctness criterion to reason about isolated code regions in a multithreaded program when they are executed concurrently. However, dynamic instances of these code regions, called transactions , may fail to behave atomically, resulting in transactional atomicity violations. Existing dynamic online atomicity checkers incur either false positives or false negatives in detecting transactions experiencing transactional atomicity violations. This article proposes <monospace>RegionTrack</monospace>. <monospace>RegionTrack</monospace> tracks cross-thread dependences at the event, dynamic subregion, and transaction levels. It maintains both dynamic subregions within selected transactions and transactional happens-before relations through its novel timestamp propagation approach. We prove that <monospace>RegionTrack</monospace> is sound and complete in detecting both transactional atomicity violations and non-serializable traces. To the best of our knowledge, it is the first online technique that precisely captures the transitively closed set of happens-before relations over all conflicting events with respect to every running transaction for the above two kinds of issues. We have evaluated <monospace>RegionTrack</monospace> on 19 subjects of the DaCapo and the Java Grande Forum benchmarks. The empirical results confirm that <monospace>RegionTrack</monospace> precisely detected all those transactions which experienced transactional atomicity violations and identified all non-serializable traces. The overall results also show that <monospace>RegionTrack</monospace> incurred 1.10x and 1.08x lower memory and runtime overheads than <monospace>Velodrome</monospace> and 2.10x and 1.21x lower than <monospace>Aerodrome</monospace>, respectively. Moreover, it incurred 2.89x lower memory overhead than <monospace>DoubleChecker</monospace>. On average, <monospace>Velodrome</monospace> detected about 55% fewer violations than <monospace>RegionTrack</monospace>, which in turn reported about 3%–70% fewer violations than <monospace>DoubleChecker</monospace>.

Read full abstract

Runtime Overhead Research Articles

Related Topics

Articles published on Runtime Overhead

λDNN: Achieving Predictable Distributed DNN Training With Serverless Architectures

Dynamic task allocation and scheduling with contention-awareness for Network-on-Chip based multicore systems

A Co-Design Adaptive Defense Scheme With Bounded Security Damages Against Heartbleed-Like Attacks

Enhancing Dynamic Binary Translation in Mobile Computing by Leveraging Polyhedral Optimization

Algorithm-Based Fault Tolerance for Convolutional Neural Networks

Constructing Object Groups Corresponding to Concepts for Recovery of a Summarized Sequence Diagram

Validity Tracking Based Log Management for In-Memory Databases

RegionTrack

Mitigating Data-only Attacks by Protecting Memory-resident Sensitive Data

Large-scale Debloating of Binary Shared Libraries

Efficient Buffer Overflow Detection on GPU

Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments

Platform-Independent Dynamic Taint Analysis for JavaScript

Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization

On Architectural Support for Instruction Set Randomization

Code generation for energy‐efficient execution of dynamic streaming task graphs on parallel and heterogeneous platforms

ABCFI: Fast and Lightweight Fine-Grained Hardware-Assisted Control-Flow Integrity

A Hardware-Based Architecture-Neutral Framework for Real-Time IoT Workload Forensics

A Control Theoretic Approach to ABR Video Streaming: A Fresh Look at PID-Based Rate Adaptation

RATScope: Recording and Reconstructing Missing RAT Semantic Behaviors for Forensic Analysis on Windows

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Runtime Overhead Research Articles

Related Topics

Articles published on Runtime Overhead

λDNN: Achieving Predictable Distributed DNN Training With Serverless Architectures

Dynamic task allocation and scheduling with contention-awareness for Network-on-Chip based multicore systems

A Co-Design Adaptive Defense Scheme With Bounded Security Damages Against Heartbleed-Like Attacks

Enhancing Dynamic Binary Translation in Mobile Computing by Leveraging Polyhedral Optimization

Algorithm-Based Fault Tolerance for Convolutional Neural Networks

Constructing Object Groups Corresponding to Concepts for Recovery of a Summarized Sequence Diagram

Validity Tracking Based Log Management for In-Memory Databases

RegionTrack

Mitigating Data-only Attacks by Protecting Memory-resident Sensitive Data

Large-scale Debloating of Binary Shared Libraries

Efficient Buffer Overflow Detection on GPU

Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments

Platform-Independent Dynamic Taint Analysis for JavaScript

Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization

On Architectural Support for Instruction Set Randomization

Code generation for energy‐efficient execution of dynamic streaming task graphs on parallel and heterogeneous platforms

ABCFI: Fast and Lightweight Fine-Grained Hardware-Assisted Control-Flow Integrity

A Hardware-Based Architecture-Neutral Framework for Real-Time IoT Workload Forensics

A Control Theoretic Approach to ABR Video Streaming: A Fresh Look at PID-Based Rate Adaptation

RATScope: Recording and Reconstructing Missing RAT Semantic Behaviors for Forensic Analysis on Windows