Abstract
The paper presents MC3A- Markov Chain Monte Carlo Many Core Accelerator, a high-throughput, domain-specific, programmable manycore accelerator, which effectively generates samples from a provided target distribution. MCMC samplers are used in machine learning, image and signal processing applications that are computationally intensive. In such scenarios, high-throughput samplers are of paramount importance. To achieve a high-throughput platform, we add two domain-specific instructions with dedicated hardware whose functions are extensively used in MCMC algorithms. These instructions bring down the number of clock cycles needed to implement the respective functions by 10x and 21x. A 64-cluster architecture of the MC3A is fully placed and routed in 65 nm, TSMC CMOS technology, where the VLSI layout of each cluster occupies an area of 0.577 mm^2 while consuming a power of 247 mW running at 1 GHz clock frequency. Our proposed MC3A achieves 6x higher throughput than its equivalent predecessor (PENC) and consumes 4x lower energy per sample. Also, when compared to other off-the-shelf platforms, such as Jetson TX1 and TX2 SoC, MC3A results in 195x and 191x higher throughput and consumes 808x and 726x lower energy per sample generation, respectively.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have