Compressing Neural Networks Using Learnable 1-D Non-Linear Functions

Gaurav Singh,Kia Bazargan

doi:10.1145/3705926

Abstract

As deep learning models grow in size to achieve state-of-the-art accuracy, there is a pressing need for compact models. To address this challenge, we introduce a novel operation called Personal Self-Attention (PSA). It is specifically designed to learn non-linear 1-D functions, enhancing existing spline-based methods while remaining compatible with gradient backpropagation. By integrating these non-linear functions with linear transformations, we can achieve the accuracy of larger models but with significantly smaller hidden dimensions, which is crucial for FPGA implementations. We evaluate PSA by implementing it in a Multi-Layer Perceptron (MLP)-based vision model, ResMLP, and testing it on the CIFAR-10 classification task. MLP is gaining increasing popularity due to its widespread use in large-language models. Our results confirm that PSA achieves equivalent accuracy with a 2x smaller hidden size compared to conventional MLPs. Furthermore, by quantizing our non-linear function into a simple lookup table, we reduce the number of operations required by 45%-28%, which offers significant benefits for hardware accelerators. To showcase this, we design an end-to-end unrolled streaming accelerator for ResMLP, demonstrating that our compressed model maintains an 88% accuracy while reducing LUT + DSP resource requirements by 25%, and doubling throughput to 32kFPS. Additionally, we implement a fixed-size SIMD accelerator for the same compressed model that achieves a 62.1% improvement in throughput while only consuming 3.5% extra LUTs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Compressing Neural Networks Using Learnable 1-D Non-Linear Functions

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Reconfigurable Technology and Systems

Lead the way for us

Similar Papers

Compressing Neural Networks Using Learnable 1-D Non-Linear Functions
Gaurav Singh ... Kia Bazargan
ACM Transactions on Reconfigurable Technology and Systems | VOL. -
Gaurav Singh, et. al.Gaurav Singh ... Kia Bazargan
03 Dec 2024
ACM Transactions on Reconfigurable Technology and Systems | VOL. -

DF-BETA: An FPGA-based Memory Locality Aware Decision Forest Accelerator via Bit-Level Early Termination
Daichi Tokuda ... Shinya Takamaeda-Yamazaki
ACM Transactions on Reconfigurable Technology and Systems | VOL. -
Daichi Tokuda, et. al.Daichi Tokuda ... Shinya Takamaeda-Yamazaki
02 Dec 2024
ACM Transactions on Reconfigurable Technology and Systems | VOL. -

A Speculative Loop Pipeline Framework with Accurate Path Modeling for High-Level Synthesis
Yuhan She ... Hong Yan
ACM Transactions on Reconfigurable Technology and Systems | VOL. -
Yuhan She, et. al.Yuhan She ... Hong Yan
26 Nov 2024
ACM Transactions on Reconfigurable Technology and Systems | VOL. -

Fantastic Circuits and Where to Find Them - A Holistic ILP Formulation for Model-Based Hardware Design
Nicolai Fiege ... Peter Zipf
ACM Transactions on Reconfigurable Technology and Systems | VOL. -
Nicolai Fiege, et. al.Nicolai Fiege ... Peter Zipf
22 Nov 2024
ACM Transactions on Reconfigurable Technology and Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Compressing Neural Networks Using Learnable 1-D Non-Linear Functions

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Reconfigurable Technology and Systems