Abstract
Proposed is a parallel array histogram architecture (PAHA) suitable for embedded implementations. The PAHA uses a register array instead of a memory block to store the histogram bins. In each step, M inputs can be processed in parallel to update the histogram bins without any additional latency. Also described is a second version of the PAHA with a flexible number of inputs, potentially avoiding the need for multiple PAHAs in a single application. Implementation results show that the architecture can achieve a super-linear speed-up of 43.75× for a 16-way PAHA when compared to a software implementation in a general-purpose processor.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have