Evaluation of External Memory Access Performance on a High-End FPGA Hybrid Computer

Konstantinos Kalaitzis,Ioannis Papaefstathiou,Apostolos Dollas,Evripidis Sotiriadis

doi:10.3390/computation4040041

Konstantinos Kalaitzis, Ioannis Papaefstathiou + Show 2 more

Open Access

PDF Available

https://doi.org/10.3390/computation4040041

Copy DOI

Export

Save

Cite

Journal: Computation	Publication Date: Oct 25, 2016
Citations: 1	License type: CC BY 4.0

Affiliation: Technical University of Crete

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The motivation of this research was to evaluate the main memory performance of a hybrid super computer such as the Convey HC-x, and ascertain how the controller performs in several access scenarios, vis-à-vis hand-coded memory prefetches. Such memory patterns are very useful in stencil computations. The theoretical bandwidth of the memory of the Convey is compared with the results of our measurements. The accurate study of the memory subsystem is particularly useful for users when they are developing their application-specific personality. Experiments were performed to measure the bandwidth between the coprocessor and the memory subsystem. The experiments aimed mainly at measuring the reading access speed of the memory from Application Engines (FPGAs). Different ways of accessing data were used in order to find the most efficient way to access memory. This way was proposed for future work in the Convey HC-x. When performing a series of accesses to memory, non-uniform latencies occur. The Memory Controller of the Convey HC-x in the coprocessor attempts to cover this latency. We measure memory efficiency as a ratio of the number of memory accesses and the number of execution cycles. The result of this measurement converges to one in most cases. In addition, we performed experiments with hand-coded memory accesses. The analysis of the experimental results shows how the memory subsystem and Memory Controllers work. From this work we conclude that the memory controllers do an excellent job, largely because (transparently to the user) they seem to cache large amounts of data, and hence hand-coding is not needed in most situations.

Highlights

From this work we conclude that the memory controllers do an excellent job, largely because they seem to cache large amounts of data, and hand-coding is not needed in most situations
Stencil computations invariably need large volumes of data, which have to be fetched from external memory, typically some form of external dynamic memory
The gather-scatter mechanisms which are very useful in dealing with very sparse matrices and other highly irregular data are not considered, as it is evident that the complete lack of structure would not give any incentive to the programmer to proceed with hand-coding, whereas the case studies that we have evaluated do have some structure which a smart memory controller could unveil

Summary

Introduction

Stencil computations invariably need large volumes of data, which have to be fetched from external memory, typically some form of external dynamic memory. Vector supercomputers of the 1970s–1990s, such as the Cray-1, Cray-2, CDC-205, Fujitsu VP2600/10, NEC SX series, Convey C1, etc., supported fast memory accesses with “strides”, i.e., fixed distances in the memory between accesses, as these are needed for matrix column and matrix diagonal accesses; these machines had static memory with fixed access time and non-dynamic memory.

Objectives

Methods

Conclusion