Abstract

Systolic arrays (SAs) are very efficient architectures for multimedia processing, database management, and scientific computing applications that are characterized by a high number of data access. However, in these data transfer and storage intensive applications, memory access is often the limiting factor to the computation speed. Since the memory subsystem dominates the cost (area), performance and power consumption of the SA, we have to pay a special attention to how memory subsystem can benefit from customization. In this paper we consider memory organization of linear systolic array with bi-directional links (called BLSA) suitable for implementation of broad class of algorithms. We assume that memory is organized into distributed smaller physical memory modules. In order to provide high bandwidth in data access we have designed special hardware, called address generator unit (AGU). The function of AGU is threefold. First, during the initialization, it transforms host address space into BLSA address space. Second, provides efficient memory data access during BLSA operation. Third, performs fast data transfer between BLSA and host at the end of the computation. In this article, we examine the impact on area and performance of memory access related circuity in eliminating computational intensive offset address calculations performed in software by implementing the needed address transformations with the AGUs. By involving hardware AGUs we achieved a speedup of approximately two, compared to the software implementation of address calculation, with a hardware overhead of only 7.6% in the worst case.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call