Large-scale discrete convolution, well-known to be computationally intensive, is a fundamental algorithmic building block in many computer vision and artificial intelligence applications. This work presents a novel stochastic-based hardware architecture and design that computes discrete convolution based on the widely-used convolution theorem. Our approach has three advantages. First, it can achieve approximately <inline-formula><tex-math notation="LaTeX">${\mathrm O} (1)$</tex-math></inline-formula> in algorithmic complexity for any given absolute error bound <inline-formula><tex-math notation="LaTeX">$d$</tex-math></inline-formula> and any given input vector size <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> . This computing complexity, when compared with <inline-formula><tex-math notation="LaTeX">${\mathrm O} (N^2)$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">${\mathrm O} (N \log N)$</tex-math></inline-formula> operations for conventional multiplier-based and FFT-based architectures respectively, represents a significant improvement, although at the cost of degraded computing accuracy. Second, to achieve a given computing accuracy, our proposed stochastic-based convolution can be analytically proven to only require a moderate number of random samples, e.g., 788 random samples can achieve 95 percent accuracy at 99 percent confidence level for a convolution with <inline-formula><tex-math notation="LaTeX">$N=128$</tex-math></inline-formula> . Third, this proposed stochastic-based architecture is highly fault-tolerant because the information to be processed is encoded with a large ensemble of random samples. As such, the local perturbations of its computing accuracy will be dissipated globally, thus becoming inconsequential to the final overall results. We believe that, being highly scalable and energy efficient, our stochastic-based convolution architecture is well-suited for many real-time embedded applications, especially those perception-based computing tasks that are inherently fault-tolerant. In short, this work provides an elegant way to tradeoff between computing accuracy and computing performance/hardware efficiency for many real-world convolution-based applications.
Read full abstract