Abstract

Heterogeneous parallel platforms, comprising multiple processing units and architectures, have become a cornerstone in improving the overall performance and energy efficiency of scientific and engineering applications. Nevertheless, taking full advantage of their resources comes along with a variety of difficulties: developers require technical expertise in using different parallel programming frameworks and previous knowledge about the algorithms used underneath by the application. To alleviate this burden, we present an adaptive offline implementation selector that allows users to better exploit resources provided by heterogeneous platforms. Specifically, this framework selects, at compile time, the tuple device-implementation that delivers the best performance on a given platform. The user interface of the framework leverages two C++ language features: attributes and concepts. To evaluate the benefits of this framework, we analyse the global performance and convergence of the selector using two different use cases. The experimental results demonstrate that the proposed framework allows users enhancing performance while minimizing efforts to tune applications targeted to heterogeneous platforms. Furthermore, we also demonstrate that our framework delivers comparable performance figures with respect to other approaches.

Highlights

  • In recent years, heterogeneous parallel architectures have provided a way to improve performance and energy efficiency better than other alternatives

  • This section gives a brief overview about the two C++ language features used for developing the implementation selector interface: C++ attributes and concepts

  • We evaluate the adaptability of the selector to make appropriate decisions when a new device is attached to the heterogeneous platform each 100 training iterations

Read more

Summary

Introduction

Heterogeneous parallel architectures have provided a way to improve performance and energy efficiency better than other alternatives. Platforms comprising diverse devices (such as multi-cores, GPUs, DSPs and FPGAs) are notoriously more difficult to program effectively, since they demand for distinct frameworks and application programming interfaces [5] This fact, has led to multiple implementations of the same algorithm but targeted to different devices. In order to improve performance, developers need to analyze a priori the target platform and the application, along with its implementation alternatives and available libraries. To achieve this goal, some aspects need to be considered. An alternative to the aforementioned technique is to shift the decision-making task directly at compile time Several proposals leveraging this static approach and based on analytic models, machine learning and adaptive optimization methods can be found in the literature [1].

Related Work
Background
The hardware parallel platform description language
The adaptive offline implementation selector
The attributes-based interface
The selector module
Experimentalevaluation
Evaluation of the accuracy and performance
Evaluation of the adaptability
Comparisonwithalternativeapproaches
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call