The field of deep learning is renowned for its resource-intensive nature, hence improving its environmental impact is crucial. In this paper, we propose a novel model compression method to mitigate the energy demands of deep learning for a greener, and more sustainable AI landscape. Our approach relies on an asymmetric weakly-differentiable pruning function that leverages weight statistics to directly incorporate adaptable pruning into the quantization mechanism. This enables us to achieve higher compression rates globally while simultaneously reducing energy consumption and minimizing classification performance degradation. The efficacy of our approach was evaluated using three distinct models on three distinct datasets: cerebral emboli (HITS), epileptic seizure recognition (ESR), and MNIST. Our method demonstrated a superior balance between compression, energy consumption, and classification performance compared to other state-of-the-art extreme quantization methods, across all models and datasets. In fact, on the HITS dataset with a two-dimensional convolutional neural network, we achieved strong gains of 50.6%, 54.9%, 52.1% in compression rates (of the global model and the quantized layers only, respectively) and energy consumption, respectively, while improving the Matthews correlation coefficient by 2.5% compared to other approaches. The code is available at: https://github.com/yamilvindas/pTTQ.
Read full abstract