Performance Analysis of Homogeneous On-Chip Large-Scale Parallel Computing Architectures for Data-Parallel Applications

Xiaowen Chen,Yang Guo,Hu Chen,Axel Jantsch,Shenggang Chen,Zhonghai Lu,Shuming Chen

doi:10.1155/2015/902591

Abstract

On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to asOn-chip Large-scale Parallel Computing Architectures (OLPCs)in the paper. Homogenous OLPCs feature strong regularity and scalability due to its identical cores and routers. Data-parallel applications have their parallel data subsets that are handled individually by the same program running in different cores. Therefore, data-parallel applications are able to obtain good speedup in homogenous OLPCs. The paper addresses modeling the speedup performance of homogeneous OLPCs for data-parallel applications. When establishing the speedup performance model, the network communication latency and the ways of storing data of data-parallel applications are modeled and analyzed in detail. Two abstract concepts (equivalent serial packet and equivalent serial communication) are proposed to construct the network communication latency model. The uniform and hotspot traffic models are adopted to reflect the ways of storing data. Some useful suggestions are presented during the performance model’s analysis. Finally, three data-parallel applications are performed on our cycle-accurate homogenous OLPC experimental platform to validate the analytic results and demonstrate that our study provides a feasible way to estimate and evaluate the performance of data-parallel applications onto homogenous OLPCs.

Highlights

Introduction and MotivationAs technology advances, on-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which feature integrating a number of computing cores that run in parallel and adopting an onchip network that provides concurrent pipelined communication
On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to as On-chip Large-scale Parallel Computing Architectures (OLPCs) in the paper
The real speedups of the three applications are calculated based on the simulation results on our homogenous OLPC experimental platform

Summary

Introduction

On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which feature integrating a number of computing cores that run in parallel and adopting an onchip network that provides concurrent pipelined communication. The many-core network-based systems are referred to as On-chip Large-scale Parallel Computing Architectures (OLPCs) in the paper. Data-parallel applications have the parallel data set that can be partitioned in parallel into data subsets and each data subset can be handled individually by the same program and has marginal synchronization overhead, so they are well scalable and can be used to exploit the potential of multiple computing cores. A number of computing cores are potential to cooperate in parallel to obtain higher performance of parallel applications. On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to as On-chip Large-scale Parallel Computing Architectures (OLPCs) in the paper. Understanding the speedup potential that OLPC computing platforms can offer is a fundamental question to continually pursuing higher performance

Results

Discussion

Conclusion