DCTCNet: Sequency discrete cosine transform convolution network for visual recognition.

Jiayong Bao,Jiangshe Zhang,Chunxia Zhang,Lili Bao

doi:10.1016/j.neunet.2025.107143

Jiayong Bao, Jiangshe Zhang + Show 2 more

https://doi.org/10.1016/j.neunet.2025.107143

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The discrete cosine transform (DCT) has been widely used in computer vision tasks due to its ability of high compression ratio and high-quality visual presentation. However, conventional DCT is usually affected by the size of transform region and results in blocking effect. Therefore, eliminating the blocking effects to efficiently serve for vision tasks is significant and challenging. In this paper, we introduce All Phase Sequency DCT (APSeDCT) into convolutional networks to extract multi-frequency information of deep features. Due to the fact that APSeDCT can be equivalent to convolutional operation, we construct corresponding convolution module called APSeDCT Convolution (APSeDCTConv) that has great transferability similar to vanilla convolution. Then we propose an augmented convolutional operator called MultiConv with APSeDCTConv. By replacing the last three bottleneck blocks of ResNet with MultiConv, our approach not only reduces the computational costs and the number of parameters, but also exhibits great performance in classification, object detection and instance segmentation tasks. Extensive experiments show that APSeDCTConv augmentation leads to consistent performance improvements in image classification on ImageNet across various different models and scales, including ResNet, Res2Net and ResNext, and achieving 0.5%-1.1% and 0.4%-0.7% AP performance improvements for object detection and instance segmentation, respectively, on the COCO benchmark compared to the baseline.

Full Text