Abstract

Proposed is a parallel array histogram architecture (PAHA) suitable for embedded implementations. The PAHA uses a register array instead of a memory block to store the histogram bins. In each step, M inputs can be processed in parallel to update the histogram bins without any additional latency. Also described is a second version of the PAHA with a flexible number of inputs, potentially avoiding the need for multiple PAHAs in a single application. Implementation results show that the architecture can achieve a super-linear speed-up of 43.75× for a 16-way PAHA when compared to a software implementation in a general-purpose processor.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call