Abstract
This study explores the combined use of fixed-point quantization and structured pruning to optimize the performance and efficiency of convolutional neural networks (CNNs) in image classification tasks. These techniques can be used to reduce model size and computational complexity, making CNNs more suitable for deployment in resource-constrained environments such as mobile devices and embedded systems. Fixed-point quantization methods can reduce the bit-width of weights and activations, thereby reducing the computational load and memory footprint. On the other hand, structured pruning systematically removes unimportant convolutional filters or channels, which further reduces the model size and increases the inference speed. An experimental evaluation was performed on the ImageNet dataset using the ResNet-50 architecture. The results show that the combined strategy of quantization and pruning reduces the model size by up to 75% and increases the inference speed by 50%, while maintaining a classification accuracy of 74.5%, compared to 76.4% for the baseline model. Considering the significant increase in efficiency, a slight decrease in accuracy is acceptable. The results show that the integrated approach effectively compresses and accelerates the CNN model without a significant drop in accuracy, making it ideal for real-time applications.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have