Local CNN Research Articles

RGB-D image-based scene recognition has achieved significant performance improvement with the development of deep learning methods. While convolutional neural networks can learn high-semantic level features for object recognition, these methods still have limitations for RGB-D scene classification. One limitation is that how to learn better multi-modal features for the RGB-D scene recognition is still an open problem. Another limitation is that the scene images are usually not object-centric and with great spatial variability. Thus, vanilla full-image CNN features maybe not optimal for scene recognition. Considering these problems, in this paper, we propose a compact and effective framework for RGB-D scene recognition. Specifically, we make the following contributions: 1) A novel RGB-D scene recognition framework is proposed to explicitly learn the global modal-specific and local modal-consistent features simultaneously. Different from existing approaches, local CNN features are considered for the learning of modal-consistent representations; 2) key Feature Selection (KFS) module is designed, which can adaptively select important local features from the high-semantic level CNN feature maps. It is more efficient and effective than object detection and dense patch-sampling based methods, and; 3) a triplet correlation loss and a spatial-attention similarity loss are proposed for the training of KFS module. Under the supervision of the proposed loss functions, the network can learn import local features of two modalities with no need for extra annotations. Finally, by concatenating the global and local features together, the proposed framework can achieve new state-of-the-art scene recognition performance on the SUN RGB-D dataset and NYU Depth version 2 (NYUD v2) dataset.

Read full abstract

While convolutional neural networks (CNNs) have been excellent for object recognition, the greater spatial variability in scene images typically means that the standard full-image CNN features are suboptimal for scene classification. In this article, we investigate a framework allowing greater spatial flexibility, in which the Fisher vector (FV)-encoded distribution of local CNN features, obtained from a multitude of region proposals per image, is considered instead. The CNN features are computed from an augmented pixel-wise representation consisting of multiple modalities of RGB, HHA, and surface normals, as extracted from RGB-D data. More significantly, we make two postulates: (1) component sparsity—that only a small variety of region proposals and their corresponding FV GMM components contribute to scene discriminability, and (2) modal nonsparsity—that features from all modalities are encouraged to coexist. In our proposed feature fusion framework, these are implemented through regularization terms that apply group lasso to GMM components and exclusive group lasso across modalities. By learning and combining regressors for both proposal-based FV features and global CNN features, we are able to achieve state-of-the-art scene classification performance on the SUNRGBD Dataset and NYU Depth Dataset V2. Moreover, we further apply our feature fusion framework on an action recognition task to demonstrate that our framework can be generalized for other multimodal well-structured features. In particular, for action recognition, we enforce interpart sparsity to choose more discriminative body parts, and intermodal nonsparsity to make informative features from both appearance and motion modalities coexist. Experimental results on the JHMDB and MPII Cooking Datasets show that our feature fusion is also very effective for action recognition, achieving very competitive performance compared with the state of the art.

Read full abstract

Local CNN Research Articles

Related Topics

Articles published on Local CNN

Comprehensive Tennis Serve Training System Based on Local Attention-Based CNN Model

A multi-granularity hierarchical network for long- and short-term forecasting on multivariate time series data

GLNET: global–local CNN's-based informed model for detection of breast cancer categories from histopathological slides

Satellite Image Classification Using Extended Local Binary Patterns, SVM AND CNN

Online Prediction of Cutting Temperature Using Self-Adaptive Local Learning and Dynamic CNN

DSACNN: Dynamically local self-attention CNN for 3D point cloud analysis

Image Retrieval of Tourism Landscape in Rural Revitalization Based on Wireless Communication Network

Local Binary CNN for Diabetic Retinopathy Classification on Fundus Images

Named Entity Recognition in Electric Power Metering Domain Based on Attention Mechanism

Thorax disease classification with attention guided convolutional neural network

Evaluation of Local Descriptors and Deep CNN Features for Face Anti Spoofing

Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images

RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

Multimodal activity recognition with local block CNN and attention-based spatial weighted CNN

Structure-Aware Multimodal Feature Fusion for RGB-D Scene Classification and Beyond

Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction

Superpixel-Based Multiple Local CNN for Panchromatic and Multispectral Image Classification

Robust face detection using local CNN and SVM based on kernel combination

SPATIAL DEPTH EXTRACTION USING RANDOM STEREOGRAMS IN ANALOGIC CNN FRAMEWORK

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Local CNN Research Articles

Related Topics

Articles published on Local CNN

Comprehensive Tennis Serve Training System Based on Local Attention-Based CNN Model

A multi-granularity hierarchical network for long- and short-term forecasting on multivariate time series data

GLNET: global–local CNN's-based informed model for detection of breast cancer categories from histopathological slides

Satellite Image Classification Using Extended Local Binary Patterns, SVM AND CNN

Online Prediction of Cutting Temperature Using Self-Adaptive Local Learning and Dynamic CNN

DSACNN: Dynamically local self-attention CNN for 3D point cloud analysis

Image Retrieval of Tourism Landscape in Rural Revitalization Based on Wireless Communication Network

Local Binary CNN for Diabetic Retinopathy Classification on Fundus Images

Named Entity Recognition in Electric Power Metering Domain Based on Attention Mechanism

Thorax disease classification with attention guided convolutional neural network

Evaluation of Local Descriptors and Deep CNN Features for Face Anti Spoofing

Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images

RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

Multimodal activity recognition with local block CNN and attention-based spatial weighted CNN

Structure-Aware Multimodal Feature Fusion for RGB-D Scene Classification and Beyond

Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction

Superpixel-Based Multiple Local CNN for Panchromatic and Multispectral Image Classification

Robust face detection using local CNN and SVM based on kernel combination

SPATIAL DEPTH EXTRACTION USING RANDOM STEREOGRAMS IN ANALOGIC CNN FRAMEWORK