This tutorial paper addresses a low power computer vision system as an example of a growing application domain of neural networks, exploring various technologies developed to enhance accuracy within the resource and performance constraints imposed by the hardware platform. Focused on a given hardware platform and network model, software optimization techniques, including pruning, quantization, low-rank approximation, and parallelization, aim to satisfy resource and performance constraints while minimizing accuracy loss. Due to the interdependence of model compression approaches, their systematic application is crucial, as evidenced by winning solutions in the Lower Power Image Recognition Challenge (LPIRC) of 2017 and 2018. Recognizing the typical heterogeneity of processing elements in contemporary hardware platforms, the effective utilization through parallelizing neural networks emerges as increasingly vital for performance enhancement. The paper advocates for a more impactful strategy—designing a network architecture tailored to a specific hardware platform. For detailed information on each technique, the paper provides corresponding references.
Read full abstract