Communication Patterns Of Applications Research Articles

The article is about the problem of improving parallel applications efficiency. It proposes an approach to solve this problem. The approach aims to reduce communication overheads of data exchange between parallel program processes during its execution on high-performance computing system. Growing number of computer nodes leads increasing impact of the communication overhead on application performance. As a sequence the problem of parallel processes allocation to hardware nodes (mapping problem) becomes very actual. The new approach for mapping problem is proposed in this work. The key characteristic of the approach is to extract a communication pattern by phase analysis of application using a convolutional neural network to fast choosing the most relevant mapping algorithm for extracted pattern. Investigation of results of point-to-point communications between parallel program processes is used to build a communication pattern. The application timeline is broken into equal intervals. Communication patterns are created for each interval. Then Haar 2D wavelet transform is applied to these patterns for generating features. Features are clustered, after that, timeline is splitted into phases. Each phase has its own communication pattern. The selection of the most relevant mapping algorithm is performed by using convolutional neural network. It supposes some knowledge about different types of parallel applications and most suitable mapping algorithms. This knowledge must be represented by a set of pattern classes (classes of matrices), each class has the mapping algorithm, which fits best for application with this type of pattern. This set can be a training sample for the neural network. Thus the neural network classifies an input application communication pattern and finds a mapping algorithm for the application. The stages of proposed approach are implemented in this work. This implementation is demonstrated for some tests.

Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization and low power dissipation, we propose REDEFINE, a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re -composition. It is a “future-proof” custom hardware solution for multiple applications and their derivatives in a domain. In this article, we describe a compiler framework and supporting hardware comprising compute, storage, and communication resources. Applications described in high-level language (e.g., C) are compiled into application substructures. For each application substructure, a set of compute elements on the hardware are interconnected during runtime to form a pattern that closely matches the communication pattern of that particular application. The advantage is that the bounded CEs are neither processor cores nor logic elements as in FPGAs. Hence, REDEFINE offers the power and performance advantage of an ASIC and the hardware reconfigurability and programmability of that of an FPGA/instruction set processor. In addition, the hardware supports custom instruction pipelining. Existing instruction-set extensible processors determine a sequence of instructions that repeatedly occur within the application to create custom instructions at design time to speed up the execution of this sequence. We extend this scheme further, where a kernel is compiled into custom instructions that bear strong producer-consumer relationship (and not limited to frequently occurring sequences of instructions). Custom instructions, realized as hardware compositions effected at runtime, allow several instances of the same to be active in parallel. A key distinguishing factor in majority of the emerging embedded applications is stream processing. To reduce the overheads of data transfer between custom instructions, direct communication paths are employed among custom instructions. In this article, we present the overview of the hardware-aware compiler framework, which determines the NoC-aware schedule of transports of the data exchanged between the custom instructions on the interconnect. The results for the FFT kernel indicate a 25% reduction in the number of loads/stores, and throughput improves by log(n) for n-point FFT when compared to sequential implementation. Overall, REDEFINE offers flexibility and a runtime reconfigurability at the expense of 1.16× in power and 8× in area when compared to an ASIC. REDEFINE implementation consumes 0.1× the power of an FPGA implementation. In addition, the configuration overhead of the FPGA implementation is 1,000× more than that of REDEFINE.

Communication Patterns Of Applications Research Articles

Related Topics

Articles published on Communication Patterns Of Applications

TLS fingerprint for encrypted malicious traffic detection with attributed graph kernel

Performance characterization of containerization for HPC workloads on InfiniBand clusters: an empirical study

QTMS: A quadratic time complexity topology-aware process mapping method for large-scale parallel applications on shared HPC system

Нейросетевой метод решения задачи мэппинга параллельных приложений

A Survey of Communication Performance Models for High-Performance Computing

EagerMap

Topology-aware job mapping

Topology mapping of irregular parallel applications on torus-connected supercomputers

An Erlang Implementation of Multiparty Session Actors

TransMap: Transformation Based Re<bold>m</bold>apping and <bold>P</bold>arallelism for High Utilization and Energy Efficiency in CGRAs

On the design of a new dynamic credit-based end-to-end flow control mechanism for HPC clusters

A topology-aware method for scientific application deployment on cloud

Predictive and Distributed Routing Balancing, an Application-Aware Approach

Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments

Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications

An abacus turn model for time/space-efficient reconfigurable routing

A NOC closed-loop performance monitor and adapter

The scalable process topology interface of MPI 2.2

REDEFINE

Low Diameter Interconnections for Routing in High-Performance Parallel Systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Communication Patterns Of Applications Research Articles

Related Topics

Articles published on Communication Patterns Of Applications

TLS fingerprint for encrypted malicious traffic detection with attributed graph kernel

Performance characterization of containerization for HPC workloads on InfiniBand clusters: an empirical study

QTMS: A quadratic time complexity topology-aware process mapping method for large-scale parallel applications on shared HPC system

Нейросетевой метод решения задачи мэппинга параллельных приложений

A Survey of Communication Performance Models for High-Performance Computing

EagerMap

Topology-aware job mapping

Topology mapping of irregular parallel applications on torus-connected supercomputers

An Erlang Implementation of Multiparty Session Actors

TransMap: Transformation Based Re&lt;bold&gt;m&lt;/bold&gt;apping and &lt;bold&gt;P&lt;/bold&gt;arallelism for High Utilization and Energy Efficiency in CGRAs

On the design of a new dynamic credit-based end-to-end flow control mechanism for HPC clusters

A topology-aware method for scientific application deployment on cloud

Predictive and Distributed Routing Balancing, an Application-Aware Approach

Automatically optimized core mapping to subdomains of domain decomposition method on multicore parallel environments

Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications

An abacus turn model for time/space-efficient reconfigurable routing

A NOC closed-loop performance monitor and adapter

The scalable process topology interface of MPI 2.2

REDEFINE

Low Diameter Interconnections for Routing in High-Performance Parallel Systems

TransMap: Transformation Based Re<bold>m</bold>apping and <bold>P</bold>arallelism for High Utilization and Energy Efficiency in CGRAs