An attention-based fuzzy CNN-LTSM network for visual object recognition from fMRI images.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

An attention-based fuzzy CNN-LTSM network for visual object recognition from fMRI images.

Similar Papers
  • Conference Article
  • Cite Count Icon 2
  • 10.1109/ijcnn.2019.8851738
Active visual object exploration and recognition with an unmanned aerial vehicle
  • Jul 1, 2019
  • Uriel Martinez-Hernandez + 2 more

In this paper, an active control method for visual object exploration and recognition with an unmanned aerial vehicle is presented. This work uses a convolutional neural network for visual object recognition, where input images are obtained with an unmanned aerial vehicle from multiple objects. The object recognition task is an iterative process actively controlled by a saliency map module, which extracts interesting object regions for exploration. The active control allows the unmanned aerial vehicle to autonomously explore better object regions to improve the recognition accuracy. The iterative exploration task stops when the probability from the convolutional neural network exceeds a decision threshold. The active control is validated with offline and real-time experiments for visual exploration and recognition of five objects. Furthermore, passive exploration is also tested for performance comparison. Experiments show that the unmanned aerial vehicle is capable to autonomously explore interesting object regions. Results also show an improvement in recognition accuracy from 88.14% to 95.66% for passive and active exploration, respectively. Overall, this work offers a framework to allow robots to autonomously decide where to move and look next, to improve the performance during a visual object exploration and recognition task.

  • Conference Article
  • Cite Count Icon 11
  • 10.1109/cvprw.2018.00268
Scene Grammar in Human and Machine Recognition of Objects and Scenes
  • Jun 1, 2018
  • Akram Bayat + 4 more

In this paper, we study the effects of violating the high level scene syntactic and semantic rules on human eye-movement behavior and deep neural scene and object recognition networks. An eye-movement experimental study was conducted with twenty human subjects to view scenes from the SCEGRAM image database and determine whether there is an inconsistent object or not. We examine the contribution of multiple types of features that influence eye movements while searching for an inconsistent object in a scene (e.g., size and location of an object) by evaluating the consistency prediction power of the trained classifiers on fixation features. The results of the eye movement analysis and inconsistency prediction reveal that: 1) inconsistent objects are fixated significantly more than consistent objects in a scene, 2) the distribution of fixations is the main factor that is influenced by the inconsistency condition of a scene which is reflected in the ground truth fixation maps. It is also observed that the performance of deep object and scene recognition networks drops due to the violations of scene grammar. The class-specific visual saliency maps are created from the high-level representation of the convolutional layers of a deep network during the scene and object recognition process. We discuss whether the scene inconsistencies are represented in those saliency maps by evaluating their prediction powers using multiple well-known metrics including AUC, SIM, and KL. The results suggest that an inconsistent object in a scene causes significant variations in the prediction power of saliency maps.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.1177/1729881417752820
A PCA–CCA network for RGB-D object recognition
  • Jan 1, 2018
  • International Journal of Advanced Robotic Systems
  • Shiying Sun + 3 more

Object recognition is one of the essential issues in computer vision and robotics. Recently, deep learning methods have achieved excellent performance in red-green-blue (RGB) object recognition. However, the introduction of depth information presents a new challenge: How can we exploit this RGB-D data to characterize an object more adequately? In this article, we propose a principal component analysis–canonical correlation analysis network for RGB-D object recognition. In this new method, two stages of cascaded filter layers are constructed and followed by binary hashing and block histograms. In the first layer, the network separately learns principal component analysis filters for RGB and depth. Then, in the second layer, canonical correlation analysis filters are learned jointly using the two modalities. In this way, the different characteristics of the RGB and depth modalities are considered by our network as well as the characteristics of the correlation between the two modalities. Experimental results on the most widely used RGB-D object data set show that the proposed method achieves an accuracy which is comparable to state-of-the-art methods. Moreover, our method has a simpler structure and is efficient even without graphics processing unit acceleration.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/ijcnn.2010.5596497
Fusing bottom-up and top-down pathways in neural networks for visual object recognition
  • Jul 1, 2010
  • Yuhua Zheng + 2 more

In this paper, an artificial neural network model is built up with two pathways: bottom-up sensory-driven pathway and top-down expectation-driven pathway, which are fused to train the neural network for visual object recognition. During the supervised learning process, the bottom-up pathway generates hypotheses as network outputs. Then target label will be applied to update the bottom-up connections. On the other hand, the hypotheses generated by the bottom-up pathway will produce expectations on the sensory input through the top-down pathway. The expectations will be constrained by the real data from the sensory input which can be used to update the top-down connections accordingly. This two-pathway based neural network can also be applied to semi-supervised learning with both labeled and unlabeled data, where the network is able to generate hypotheses and corresponding expectations. Experiments on visual object recognition suggest that the proposed neural network model is promising to recover the object for the cases with missing data in sensory inputs.

  • Conference Article
  • Cite Count Icon 13
  • 10.1109/cvpr.2016.65
Predicting When Saliency Maps are Accurate and Eye Fixations Consistent
  • Jun 1, 2016
  • Anna Volokitin + 2 more

Many computational models of visual attention use image features and machine learning techniques to predict eye fixation locations as saliency maps. Recently, the success of Deep Convolutional Neural Networks (DCNNs) for object recognition has opened a new avenue for computational models of visual attention due to the tight link between visual attention and object recognition. In this paper, we show that using features from DCNNs for object recognition we can make predictions that enrich the information provided by saliency models. Namely, we can estimate the reliability of a saliency model from the raw image, which serves as a meta-saliency measure that may be used to select the best saliency algorithm for an image. Analogously, the consistency of the eye fixations among subjects, i.e. the agreement between the eye fixation locations of different subjects, can also be predicted and used by a designer to assess whether subjects reach a consensus about salient image locations.

  • Research Article
  • Cite Count Icon 14
  • 10.1007/s00371-018-1559-x
MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition
  • May 29, 2018
  • The Visual Computer
  • Feng Zhou + 2 more

This paper researches on the problem of object recognition using RGB-D data. Although deep convolutional neural networks have so far made progress in this area, they are still suffering a lot from lack of large-scale manually labeled RGB-D data. Labeling large-scale RGB-D dataset is a time-consuming and boring task. More importantly, such large-scale datasets often exist a long tail, and those hard positive examples of the tail can hardly be recognized. To solve these problems, we propose a multimodal self-augmentation and adversarial network (MSANet) for RGB-D object recognition, which can augment the data effectively at two levels while keeping the annotations. Toward the first level, series of transformations are leveraged to generate class-agnostic examples for each instance, which supports the training of our MSANet. Toward the second level, an adversarial network is proposed to generate class-specific hard positive examples while learning to classify them correctly to further improve the performance of our MSANet. Via the above schemes, the proposed approach wins the best results on several available RGB-D object recognition datasets, e.g., our experimental results indicate a 1.5% accuracy boost on benchmark Washington RGB-D object dataset compared with the current state of the art.

  • Research Article
  • Cite Count Icon 180
  • 10.1016/j.neuron.2020.07.040
Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence
  • Sep 11, 2020
  • Neuron
  • Martin Schrimpf + 5 more

Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence

  • Research Article
  • Cite Count Icon 15
  • 10.3390/s21010113
Transfer of Learning from Vision to Touch: A Hybrid Deep Convolutional Neural Network for Visuo-Tactile 3D Object Recognition
  • Dec 27, 2020
  • Sensors (Basel, Switzerland)
  • Ghazal Rouhafzay + 2 more

Transfer of learning or leveraging a pre-trained network and fine-tuning it to perform new tasks has been successfully applied in a variety of machine intelligence fields, including computer vision, natural language processing and audio/speech recognition. Drawing inspiration from neuroscience research that suggests that both visual and tactile stimuli rouse similar neural networks in the human brain, in this work, we explore the idea of transferring learning from vision to touch in the context of 3D object recognition. In particular, deep convolutional neural networks (CNN) pre-trained on visual images are adapted and evaluated for the classification of tactile data sets. To do so, we ran experiments with five different pre-trained CNN architectures and on five different datasets acquired with different technologies of tactile sensors including BathTip, Gelsight, force-sensing resistor (FSR) array, a high-resolution virtual FSR sensor, and tactile sensors on the Barrett robotic hand. The results obtained confirm the transferability of learning from vision to touch to interpret 3D models. Due to its higher resolution, tactile data from optical tactile sensors was demonstrated to achieve higher classification rates based on visual features compared to other technologies relying on pressure measurements. Further analysis of the weight updates in the convolutional layer is performed to measure the similarity between visual and tactile features for each technology of tactile sensing. Comparing the weight updates in different convolutional layers suggests that by updating a few convolutional layers of a pre-trained CNN on visual data, it can be efficiently used to classify tactile data. Accordingly, we propose a hybrid architecture performing both visual and tactile 3D object recognition with a MobileNetV2 backbone. MobileNetV2 is chosen due to its smaller size and thus its capability to be implemented on mobile devices, such that the network can classify both visual and tactile data. An accuracy of 100% for visual and 77.63% for tactile data are achieved by the proposed architecture.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icme51207.2021.9428458
Small Object Recognition Using a Spatio-Temporal Neural Network
  • Jul 5, 2021
  • Zhibo Liang + 4 more

Object recognition at different scales has been a fundamental problem in computer vision. In particular, small object recognition attracts increasing attention recently. However, because of working on a single frame only, many recognizers’ performances become unacceptable in many practical application scenarios: very low resolutions, invisible small targets, extremely similar appearances etc. Motivated by the way humans deal with these challenging scenarios of object recognition, this paper introduces frame sequence and attention mechanism to compensate for mutilated information. Specifically, this paper proposes a spatiotemporal neural network (dubbed STNet) for small object recognition. STNet fixes the regions of interest with a super-resolution module, and focuses on the discriminative region with a spatio-temporal attention module. In addition, STNet applies a double layer long short-term memory subnet to make full use of the inter-frame information. Furthermore, this paper presents a challenging air-target recognition dataset ATSETC4 for evaluating the performance of each method in identifying small targets. Our model outperforms many state-of-the-art models on ATSETC4, including MobileNetV2 and SENet. In particular, STNet surpasses VGG11 at an average of 3.67%, even reaches 87.50% and 82.50% on 28 scale and 14 scale on AT-SETC4 respectively.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.firesaf.2017.03.083
Fireground location understanding by semantic linking of visual objects and building information models
  • May 3, 2017
  • Fire Safety Journal
  • Florian Vandecasteele + 2 more

Fireground location understanding by semantic linking of visual objects and building information models

  • Conference Article
  • Cite Count Icon 41
  • 10.1109/cvpr.2017.772
Deep Co-occurrence Feature Learning for Visual Object Recognition
  • Jul 1, 2017
  • Ya-Fang Shih + 5 more

This paper addresses three issues in integrating part-based representations into convolutional neural networks (CNNs) for object recognition. First, most part-based models rely on a few pre-specified object parts. However, the optimal object parts for recognition often vary from category to category. Second, acquiring training data with part-level annotation is labor-intensive. Third, modeling spatial relationships between parts in CNNs often involves an exhaustive search of part templates over multiple network streams. We tackle the three issues by introducing a new network layer, called co-occurrence layer. It can extend a convolutional layer to encode the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts. To this end, the feature maps serve as both filters and images, and mutual correlation filtering is conducted between them. The co-occurrence layer is end-to-end trainable. The resultant co-occurrence features are rotation-and translation-invariant, and are robust to object deformation. By applying this new layer to the VGG-16 and ResNet-152, we achieve the recognition rates of 83.6% and 85.8% on the Caltech-UCSD bird benchmark, respectively. The source code is available at https://github.com/yafangshih/Deep-COOC.

  • Research Article
  • Cite Count Icon 109
  • 10.1016/j.neuroimage.2017.07.018
Convolutional neural network-based encoding and decoding of visual object recognition in space and time
  • Jul 16, 2017
  • NeuroImage
  • K Seeliger + 6 more

Convolutional neural network-based encoding and decoding of visual object recognition in space and time

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/i2ct.2017.8226141
A convolutional neural network for visual object recognition in marine sector
  • Apr 1, 2017
  • Aiswarya S Kumar + 1 more

Object detection and recognition are crucial elements of any high level image analysis system. Convolutional Neural Networks (CNNs) or ConvNets have been applied for recognizing the category of the principal entity in an image for several years. One major benefit of convolutional networks is the use of shared weights in the intermediate convolutional layers, which reduces the required memory size and improves performance. In this work, we created a small database of vessels to analyse different aspects of CNN for recognizing the types of marine vessels in sail. A huge network like CNN may possibly be over fitted due to lack of data. We tried to overcome this serious issue by augmenting the data as well as varying the network parameters and achieved an accuracy of 81.6%. This will be further more investigated to improve the total efficacy of the system.

  • Conference Article
  • Cite Count Icon 38
  • 10.1109/iros45743.2020.9341421
TactileSGNet: A Spiking Graph Neural Network for Event-based Tactile Object Recognition
  • Oct 24, 2020
  • Fuqiang Gu + 3 more

Tactile perception is crucial for a variety of robot tasks including grasping and in-hand manipulation. New advances in flexible, event-driven, electronic skins may soon endow robots with touch perception capabilities similar to humans. These electronic skins respond asynchronously to changes (e.g., in pressure, temperature), and can be laid out irregularly on the robot’s body or end-effector. However, these unique features may render current deep learning approaches such as convolutional feature extractors unsuitable for tactile learning. In this paper, we propose a novel spiking graph neural network for event-based tactile object recognition. To make use of local connectivity of taxels, we present several methods for organizing the tactile data in a graph structure. Based on the constructed graphs, we develop a spiking graph convolutional network. The event-driven nature of spiking neural network makes it arguably more suitable for processing the event-based data. Experimental results on two tactile datasets show that the proposed method outperforms other state-of-the-art spiking methods, achieving high accuracies of approximately 90% when classifying a variety of different household objects.

  • Research Article
  • Cite Count Icon 18
  • 10.1016/j.imavis.2019.11.006
View-based weight network for 3D object recognition
  • Nov 9, 2019
  • Image and Vision Computing
  • Qiang Huang + 2 more

View-based weight network for 3D object recognition

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.