Performance and programmability comparison of the thick control flow architecture and current multicore processors

Martti Forsell,Jesper Larsson Träff,Sara Nikula,Jussi Roivainen,Ville Leppänen

doi:10.1007/s11227-021-03985-0

Martti Forsell, Jesper Larsson Träff + Show 3 more

Open Access

https://doi.org/10.1007/s11227-021-03985-0

Copy DOI

Abstract

Commercial multicore central processing units (CPU) integrate a number of processor cores on a single chip to support parallel execution of computational tasks. Multicore CPUs can possibly improve performance over single cores for independent parallel tasks nearly linearly as long as sufficient bandwidth is available. Ideal speedup is, however, difficult to achieve when dense intercommunication between the cores or complex memory access patterns is required. This is caused by expensive synchronization and thread switching, and insufficient latency toleration. These facts guide programmers away from straight-forward parallel processing patterns toward complex and error-prone programming techniques. To address these problems, we have introduced the Thick control flow (TCF) Processor Architecture. TCF is an abstraction of parallel computation that combines self-similar threads into computational entities. In this paper, we compare the performance and programmability of an entry-level TCF processor and two Intel Skylake multicore CPUs on commonly used parallel kernels to find out how well our architecture solves these issues that greatly reduce the productivity of parallel software development. Code examples are given and programming experiences recorded.

Highlights

Multicore Central Processing Units (CPUs) are the workhorses of modern general purpose computing devices, such as workstations, tablets and smartphones
We focused on an entry-level Thick Control Flow (TCF) architecture Thick Control Flow Processor Architecture (TPA)-16 against Intel Skylake client and server multicore CPUs Core i7 and Xeon W
The comparison was implemented by writing similar parallel programs for all processors with popular programming solutions (Pthreads, OpemMP and baseline TCF language), measuring the execution time with a clock accurate simulator (TPA) and actual computers (Skylake CPUs) and counting the active code lines of programs

Summary

Introduction

Multicore Central Processing Units (CPUs) are the workhorses of modern general purpose computing devices, such as workstations, tablets and smartphones. Programmers often cannot employ natural, straight-forward parallel processing patterns; but have to replace them with more complex and error-prone structures [3] as will be confirmed by our experiments. This can be seen as extra code lines compared to textbook counterparts of matmul and matsum [4, 5] which reduce the number of active code lines (4–6 and 5–9, respectively) in both cases to a single code line containing just a parallel statement with no for-loops and explicit synchronization. The fibers within a TCF are executed synchronously with respect to each other in order to simplify parallel programming

Related work

Contribution

Hardware architectures

Xeon W

Programming methodologies

Thick control flows

POSIX threads

OpenMP

Comparison

Quantitative measurements

Overall tests

OpenMP and sequential notation

The effect of access patterns

Factors of efficient programming

Programming experiences

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Journal of Supercomputing	Publication Date: Jul 20, 2021
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

Performance and programmability comparison of the thick control flow architecture and current multicore processors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Similar Papers

Chapter 3 - Multi-core Processors and Embedded

Software Development for Embedded Multi-core Systems | VOL. -

01 Jan 2008
Software Development for Embedded Multi-core Systems | VOL. -

Multiple Clustered Core Processors
...
-
, et. al. ...
01 Mar 2006
01 Mar 2006

Design and implementation 8 bit CPU architecture on Logisim for undergraduate learning support
Wijaya Kurniawan ... Mochammad Hannats Hanafi Ichsan
-
Wijaya Kurniawan, et. al.Wijaya Kurniawan ... Mochammad Hannats Hanafi Ichsan
01 Nov 2017
01 Nov 2017

C37. Updating multicore processor simulator to support dynamic design in fetch stage
E M Saad ... H G Konsowa
-
E M Saad, et. al.E M Saad ... H G Konsowa
01 Apr 2012
01 Apr 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance and programmability comparison of the thick control flow architecture and current multicore processors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of Supercomputing