YOLO (You Only Look Once) has become the dominant paradigm in real-time object detection. However, deploying real-time object detectors on resource-constrained platforms faces challenges due to high computational and memory demands. Quantization addresses this by compressing and accelerating CNN models through the representation of weights and activations with low-precision values. Nevertheless, the quantization difficulty between weights and activations is often imbalanced. In this work, we propose MSQuant, an efficient post-training quantization (PTQ) method for CNN-based object detectors, which balances the quantization difficulty between activations and weights through migration scale. MSQuant introduces the concept of migration scales to mitigate this disparity, thereby improving overall model accuracy. An alternating search method is employed to optimize the migration scales, avoiding local optima and reducing quantization error. We select YOLOv5 and YOLOv8 models as the PTQ baseline, followed by extensive experiments on the PASCAL VOC, COCO, and DOTA datasets to explore various combinations of quantization methods. The results demonstrate the effectiveness and robustness of MSQuant. Our approach consistently outperforms other methods, showing significant improvements in quantization performance and model accuracy.
Read full abstract