Abstract

Increasing variety and affordability of multi- and many-core embedded architectures can pose both a challenge and opportunity to developers of high performance computing applications. In this paper we present a case study where we develop and evaluate a unified parallel approach to a signal-correlation algorithm,currently in-use in a commercial/industrial locating system. We utilize both HPX C++ and CUDA runtimes to achieve scalable code for current embedded multi- and many-core architectures (NVIDIA Tegra, Intel Broadwell M, Arm Cortex A-15). We also compare our approach onto traditional high-performance hardware as well as a native embedded many-core variant. To increase the accuracy of our performance analysis we introduce dedicated performance model. The results show that our approach is feasible and enables us to harness the advantages of modern micro-server architectures, but also indicates that there are limitations to some of the currently existing many-core embedded architectures, that can lead to traditional hardware being superior both in efficiency and absolute performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call