Sequential Counterpart Research Articles

Design of an efficient thread-safe concurrent data structure is a balancing act between its implementation complexity and performance. Lock-based concurrent data structures, which are relatively easy to derive from their sequential counterparts and to prove thread-safe, suffer from poor throughput under even light multi-threaded workload. At the same time, lock-free concurrent structures allow for high throughput, but are notoriously difficult to get right and require careful reasoning to formally establish their correctness. In this work, we explore a solution to this conundrum based on a relatively old idea of batch parallelism---an approach for designing high-throughput concurrent data structures via a simple insight: efficiently processing a batch of a priori known operations in parallel is easier than optimising performance for a stream of arbitrary asynchronous requests. Alas, batch-parallel structures have not seen wide practical adoption due to (i) the inconvenience of having to structure multi-threaded programs to explicitly group operations and (ii) the lack of a systematic methodology to implement batch-parallel structures as simply as lock-based ones. We present OBatcher---a Multicore OCaml library that streamlines the design, implementation, and usage of batch-parallel structures. OBatcher solves the first challenge (how to use) by suggesting a new lightweight implicit batching design pattern that is built on top of generic asynchronous programming mechanisms. The second challenge (how to implement) is addressed by identifying a family of strategies for converting common sequential structures into the corresponding efficient batch-parallel versions, and by providing a library of functors that embody those strategies. We showcase OBatcher with a diverse set of benchmarks ranging from Red-Black and AVL trees to van Emde Boas trees, skip lists, and a thread-safe implementation of a Datalog solver. Our evaluation of all the implementations on large asynchronous workloads shows that (a) they consistently outperform the corresponding coarse-grained lock-based implementations---the only ones available in OCaml to date, and that (b) their throughput scales reasonably with the number of processors.

Read full abstract

Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as “dark data”, i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the next generation of IoT applications that benefits society. Artificial Intelligence (AI) techniques are very well suited to identifying hidden patterns and correlations in this data deluge. In particular, clustering algorithms are of the utmost importance for performing exploratory data analysis to identify a set (a.k.a., cluster) of similar objects. Clustering algorithms are computationally heavy workloads and require to be executed on high-performance computing clusters, especially to deal with large datasets. This execution on HPC infrastructures is an energy hungry procedure with additional issues, such as high-latency communications or privacy. Edge computing is a paradigm to enable light-weight computations at the edge of the network that has been proposed recently to solve these issues. In this paper, we provide an in-depth analysis of emergent edge computing architectures that include low-power Graphics Processing Units (GPUs) to speed-up these workloads. Our analysis includes performance and power consumption figures of the latest Nvidia’s AGX Xavier to compare the energy-performance ratio of these low-cost platforms with a high-performance cloud-based counterpart version. Three different clustering algorithms (i.e., k-means, Fuzzy Minimals (FM), and Fuzzy C-Means (FCM)) are designed to be optimally executed on edge and cloud platforms, showing a speed-up factor of up to for the GPU code compared to sequential counterpart versions in the edge platforms and energy savings of up to 150% between the edge computing and HPC platforms.

Read full abstract

Sequential Counterpart Research Articles

Related Topics

Articles published on Sequential Counterpart

Concurrent Data Structures Made Easy

On-the-fly spectral unmixing based on Kalman filtering

Proven Distributed Memory Parallelization of Particle Methods

Theory and practice of second-order expansions for moments of 100ρ% accelerated sequential stopping times in parametric and nonparametric estimation with arbitrary fractional ρ

PARALLEL BUCKET-SORT ALGORITHM ON OPTICAL CHAINED-CUBIC TREE INTERCONNECTION NETWORK

Large-Scale Meta-Heuristic Feature Selection Based on BPSO Assisted Rough Hypercuboid Approach.

Influence of shortest path algorithms on energy consumption of multi-core processors

Certified SAT solving with GPU accelerated inprocessing

English

Parallel-FST: A feature selection library for multicore clusters

Development of NCL equivalent serial and parallel python routines for meteorological data analysis

An Efficient Parallel Framework for the Discrete Element Method Using GPU

An Efficient Parallel Version of Dynamic Multi-Objective Evolutionary Algorithm

An Optimized Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks

Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms.

ACDC: Automated Cell Detection and Counting for Time-Lapse Fluorescence Microscopy.

Parallel deterministic local search heuristic for minimum latency problem

Alternating direction implicit time integrations for finite difference acoustic wave propagation: Parallelization and convergence

High-throughput fuzzy clustering on heterogeneous architectures

CUDA-JMI: Acceleration of feature selection on heterogeneous systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sequential Counterpart Research Articles

Related Topics

Articles published on Sequential Counterpart

Concurrent Data Structures Made Easy

On-the-fly spectral unmixing based on Kalman filtering

Proven Distributed Memory Parallelization of Particle Methods

Theory and practice of second-order expansions for moments of 100ρ% accelerated sequential stopping times in parametric and nonparametric estimation with arbitrary fractional ρ

PARALLEL BUCKET-SORT ALGORITHM ON OPTICAL CHAINED-CUBIC TREE INTERCONNECTION NETWORK

Large-Scale Meta-Heuristic Feature Selection Based on BPSO Assisted Rough Hypercuboid Approach.

Influence of shortest path algorithms on energy consumption of multi-core processors

Certified SAT solving with GPU accelerated inprocessing

English

Parallel-FST: A feature selection library for multicore clusters

Development of NCL equivalent serial and parallel python routines for meteorological data analysis

An Efficient Parallel Framework for the Discrete Element Method Using GPU

An Efficient Parallel Version of Dynamic Multi-Objective Evolutionary Algorithm

An Optimized Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks

Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms.

ACDC: Automated Cell Detection and Counting for Time-Lapse Fluorescence Microscopy.

Parallel deterministic local search heuristic for minimum latency problem

Alternating direction implicit time integrations for finite difference acoustic wave propagation: Parallelization and convergence

High-throughput fuzzy clustering on heterogeneous architectures

CUDA-JMI: Acceleration of feature selection on heterogeneous systems