Abstract

A new high-performance scalable systolic array processor architecture module is presented which can simultaneously convolute k different (n x n) Filter Coefficient (FC) planes with a single (i x j) pixel Input Image Plane (IP). The architecture will have the capability to simultaneously perform convolution of k different (n x n) FC planes on 600dpi (dot per inch) IPs of size 8½” x 11” at a rate such that k convoluted Output Image (OI) plane pixels are output each system clock cycle for a system clock cycle time of less than 10 nanoseconds. Bit-parallel arithmetic is used and each IP pixel is 8-bits in length and each FC plane coefficient is 6-bits in length. A new pipelined systolic type architecture module is first developed which can generate one convoluted OI plane pixel per system clock cycle using a level of 'r' hardware resources for the case of (n = 5). The architecture is then extended in a scalable and deeper pipelined manner to allow simultaneous convolution of a single IP pixel, with k different (n×n) FC planes for the case of (n = 5), within one system clock cycle, utilizing less than (k × r) hardware resources. Synthesis and post-implementation VHDL simulation results are shown for an experimental model of the architecture which validates the scalability and functionality of the architecture. Simulation results demonstrate the performance of the architecture to be directly proportional to pipeline depth.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call