Cosine Activation in Compact Network (CACN): Application to Scene Classification

Lei Zhang,Xiantong Zhen,Wei Zhang,Zhiping Jian,Xin Li

doi:10.1109/access.2019.2926839

Lei Zhang, Xiantong Zhen + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/access.2019.2926839

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 1	License type: CC BY 4.0

Affiliation: Guangdong University of Petrochemical Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In this paper, we propose a new learning architecture named cosine activation in a compact network (CACN). The CACN is derived from kernel approximation and establishes a nonlinear hidden layer with the cosine activation function. By inheriting fusion ability in kernel approximation while learning parameters in a supervised way, the CACN is a well-directed solution to scene classification. By seamlessly connecting with convolutional neural networks (CNNs), it is easy to construct an end-to-end network. To compensate for the loss of spatial layout information in CNNs, the CACN is further combined with spatial pyramid matching to fuse various information into one holistic picture. The experiments on the MIT indoor and SUN397 datasets show that the CACN delivers high performance and demonstrates its great effectiveness for scene-classification tasks.

Highlights

Generalized scene classification can be applied in multimedia on enhanced audio [1], preprocessed video and denoised image signals [2], [3]
The convolutional neural networks (CNNs) + compact network (CACN) model in the following tables is achieved by keeping CNNs from the VGG model unchanged before the fully connected layer and connected with CACN with the parameters trained by a new dataset
CACN accomplishes information fusing by a cosine activation function, which provides an excellent explanation from the kernel approximation perspective

Summary

INTRODUCTION

Generalized scene classification can be applied in multimedia on enhanced audio [1], preprocessed video and denoised image signals [2], [3]. With the increasing scale size and category species of open sceneclassification datasets such as MIT indoor [8], SUN397 [9] and Places [10], the scene-classification task is much closer to real applications In these datasets, it is a challenge to achieve better performance for automatically distinguishing scene information, even though it is easy for human beings at only one glance. The above frameworks in BoW or VLAD/FV demonstrate impressive performance, they have limited descriptive ability in scene classification for the loss of spatial layout information. To recompense this shortcoming, [16] combines spatial pyramid matching with the bag-of-words model. The experimental results have shown that CACN achieves state-of-the-art performance, which demonstrates its great effectiveness for scene classification

RELATED WORK

EXPERIMENTS AND RESULTS

CONCLUSION