Abstract

Convolution is widely used in scientific computational fields such as digital image processing and machine learning. However, these applications are difficult to execute in realtime because they are computationally intensive. This paper introduces a high-speed convolution solution that runs on our self-developed multicore digital signal processor (DSP). To optimize the convolution capability, we propose a convolution instruction and a convolution micro architecture in the design of a subcore. As a coprocessor, the designed subcore is integrated into a network-on-chip (NoC)-based multicore DSP. In the implementation of the multicore parallel convolution, an independent convolution task-partitioning and mapping scheme is proposed. Datablock storage and software prefetching mechanisms are used to hide the data transmission time during the calculation, improving the computing efficiency. We also develop a data reuse strategy that effectively reduces the data bandwidth requirements of multicore parallel convolution. The proposed methods are applied to correlation-based template matching, with the results showing that our convolution computing approach greatly improves the performance compared with the same operations run on a personal computer, a TMS320C6678 processor and an NVIDIA Quadro 1000M graphics processing unit (GPU).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call