Abstract

Graphics processing units (GPUs) are many-core architectures that provide high performance by exploiting large degrees of data-level parallelism and employing the single instruction, multiple threads (SIMT) execution model. GPU can accelerate diverse classes of applications, including recognition, gaming, data analytics, weather prediction, and multimedia. Many of these applications are amenable to approximate execution. This application characteristic provides an opportunity to improve GPU performance and energy efficiency. Among approximation techniques, neural accelerators have been shown to provide significant performance and efficiency gains. This chapter describes our neurally accelerated GPU architecture that harmoniously embeds neural acceleration within GPU accelerators without hindering their SIMT execution while keeping hardware changes minimal.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call