Abstract
In this paper, we propose a new learning architecture named cosine activation in a compact network (CACN). The CACN is derived from kernel approximation and establishes a nonlinear hidden layer with the cosine activation function. By inheriting fusion ability in kernel approximation while learning parameters in a supervised way, the CACN is a well-directed solution to scene classification. By seamlessly connecting with convolutional neural networks (CNNs), it is easy to construct an end-to-end network. To compensate for the loss of spatial layout information in CNNs, the CACN is further combined with spatial pyramid matching to fuse various information into one holistic picture. The experiments on the MIT indoor and SUN397 datasets show that the CACN delivers high performance and demonstrates its great effectiveness for scene-classification tasks.
Highlights
Generalized scene classification can be applied in multimedia on enhanced audio [1], preprocessed video and denoised image signals [2], [3]
The convolutional neural networks (CNNs) + compact network (CACN) model in the following tables is achieved by keeping CNNs from the VGG model unchanged before the fully connected layer and connected with CACN with the parameters trained by a new dataset
CACN accomplishes information fusing by a cosine activation function, which provides an excellent explanation from the kernel approximation perspective
Summary
Generalized scene classification can be applied in multimedia on enhanced audio [1], preprocessed video and denoised image signals [2], [3]. With the increasing scale size and category species of open sceneclassification datasets such as MIT indoor [8], SUN397 [9] and Places [10], the scene-classification task is much closer to real applications In these datasets, it is a challenge to achieve better performance for automatically distinguishing scene information, even though it is easy for human beings at only one glance. The above frameworks in BoW or VLAD/FV demonstrate impressive performance, they have limited descriptive ability in scene classification for the loss of spatial layout information. To recompense this shortcoming, [16] combines spatial pyramid matching with the bag-of-words model. The experimental results have shown that CACN achieves state-of-the-art performance, which demonstrates its great effectiveness for scene classification
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have