Channel Transformer Network

Fuping Zhang,Jianming Wei,Pengcheng Zhao

doi:10.1109/access.2020.3042644

Abstract

Current attention or transform modules in Convolutional Neural Networks (CNNs) are designed pursuing lightweight and in-place. Generally, we need to decrease the channel dimension of input feature maps for reducing computation cost firstly. And then we do some transformation for extracting weight maps or converting to other feature space etc. Finally, we increase the channel dimension back for outputting feature maps with the same size as input. When we change the channel dimension, commonly we choose $1\times 1$ convolutional layers or fully connected layers. They are simple and effective, but need learning parameters and consuming more memory with other computation resources. We propose a novel parameter free method named Channel Transformer Network (CTN) to decrease or increase channels for these modules whilst keeping most information with lower computation complexity. We also introduce a Video Co-segment Attentive Network (VCAN) for person re-identification (ReID) to improve pedestrian’s noticeable representation across multiple video frames. We embed CTN in Non-local, CBAM, COSAM and VCAN blocks to replace $1\times 1$ convolutional or fully connected layers. Experiments of VCAN and CTN embedding models on Mars dataset for person ReID show significant performance in computation efficiency and accuracy, especially VCAN reaches 90.05% in Rank-1. We believe CTN can also be used in other vision tasks like image classification and object detection etc.

Full Text