Abstract

In the domain of data parallel computation, most works focus on data flow optimization inside the PE array and favorable memory hierarchy to pursue the maximum parallelism and efficiency, while the importance of data contents has been overlooked for a long time. As we observe, for structured data, insights on the contents (i.e., their values and locations within a structured form) can greatly benefit the computation performance, as fine-grained data manipulation can be performed. In this paper, we claim that by providing a flexible and adaptive data path, an efficient architecture with capability of fine-grained data manipulation can be built. Specifically, we design SOM, a portable and highly-adaptive data transmission network, with the capability of operand sorting, non-blocking self-route ordering and multicasting. Based on SOM, we propose the processing-in-transmission architecture (PITA), which extends the traditional SIMD architecture to perform some fundamental data processing during its transmission, by embedding multiple levels of SOM networks on the data path. We evaluate the performance of PITA in two irregular computation problems. We first map the matrix inversion task onto PITA and show considerable performance gain can be achieved, resulting in 3x-20x speedup against Intel MKL, and 20x-40x against cuBLAS. Then we evaluate our PITA on sparse CNNs. The results indicate that PITA can greatly improve computation efficiency and reduce memory bandwidth pressure. We achieved 2x-9x speedup against several state-of-art accelerators on sparse CNN, where nearly 100 percent PE efficiency is maintained under high sparsity. We believe the concept of PIT is a promising computing paradigm that can enlarge the capability of traditional parallel architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call