Discrete Hadamard transform (DHT) is a signal processing tool that decomposes an arbitrary input vector into a superposition of Walsh functions. Due to its wide range of applications in processing big data, a fast and energy-efficient hardware design for DHT with high throughput capability is essential. Processing in memory (PIM) allows the in-place computation to reduce the data traffic, which is a major speed bottleneck in the existing computing. In this work, we propose an efficient hybrid parallel PIM-based computation for DHT. Our proposed method explores the recursive computation of DHT and is based on the memristor-aided logic (MAGIC) gates in which the arithmetic operations are carried out via simple logic NOR operation. We propose two in-memory computing methods for the DHT encoding process. At the arithmetic level, to improve efficiency, we propose to share the intermediate results between addition and subtraction in DHT in the first method called MAGIC-DHT-1D which provides an average speedup of 1.12× over the recently proposed DigitalPIM for 1D DHT. Furthermore,MAGIC-DHT-1D also outperforms SIMPLER in terms of energy and energy density in average. We also propose a second method, called MAGIC-DHT-2D, to share the carrier independent computation cycles among multi-bit parallel addition and subtraction. At the algorithm level, we also explore both row and column-based PIM NOR computing in the same crossbar to avoid the transposition operation required in the 2D DHT process. MAGIC-DHT-2D provides an average speedup of 4.84× and 7.25× over two state-of-the-art methods DigitalPIM and SIMPLER, respectively for each complete set of 2D DHT computing cycles. Our numerical results further show that our proposed optimized methods can lead up to 56.19× and 6.90× speed-up, as well as 57.84× and 5.96× higher throughput over NVIDIA RTX Titan GPU to compute 1D DHT and 2D DHT, respectively.
Read full abstract