Abstract

Sparse Deep Neural Networks (DNN) offer a large improvement in model storage requirements, execution latency and execution throughput. DNN pruning is contingent on knowing model weights, so networks can be pruned only after training. A priori sparse neural networks have been proposed as a way to extend sparsity benefits to the training process as well. Selecting a topology a priori is also beneficial for hardware accelerator specialization, lowering power, chip area, and latency.We present NeuroFabric, a hardware-ML model co-design approach that jointly optimizes a sparse neural network topology and a hardware accelerator configuration. NeuroFabric replaces dense DNN layers with cascades of sparse layers with a specific topology. We present an efficient and data-agnostic method for sparse network topology optimization, and show that parallel butterfly networks with skip connections achieve the best accuracy independent of sparsity or depth. We also present a multi-objective optimization framework that finds a Pareto frontier of hardware-ML model configurations over six objectives: accuracy, parameter count, throughput, latency, power, and hardware area.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call