An Encoder–Decoder Model Based on Spiking Neural Networks for Address Event Representation Object Recognition
An Encoder–Decoder Model Based on Spiking Neural Networks for Address Event Representation Object Recognition
- Conference Article
11
- 10.1109/cvprw.2018.00268
- Jun 1, 2018
In this paper, we study the effects of violating the high level scene syntactic and semantic rules on human eye-movement behavior and deep neural scene and object recognition networks. An eye-movement experimental study was conducted with twenty human subjects to view scenes from the SCEGRAM image database and determine whether there is an inconsistent object or not. We examine the contribution of multiple types of features that influence eye movements while searching for an inconsistent object in a scene (e.g., size and location of an object) by evaluating the consistency prediction power of the trained classifiers on fixation features. The results of the eye movement analysis and inconsistency prediction reveal that: 1) inconsistent objects are fixated significantly more than consistent objects in a scene, 2) the distribution of fixations is the main factor that is influenced by the inconsistency condition of a scene which is reflected in the ground truth fixation maps. It is also observed that the performance of deep object and scene recognition networks drops due to the violations of scene grammar. The class-specific visual saliency maps are created from the high-level representation of the convolutional layers of a deep network during the scene and object recognition process. We discuss whether the scene inconsistencies are represented in those saliency maps by evaluating their prediction powers using multiple well-known metrics including AUC, SIM, and KL. The results suggest that an inconsistent object in a scene causes significant variations in the prediction power of saliency maps.
- Conference Article
38
- 10.1109/iros45743.2020.9341421
- Oct 24, 2020
Tactile perception is crucial for a variety of robot tasks including grasping and in-hand manipulation. New advances in flexible, event-driven, electronic skins may soon endow robots with touch perception capabilities similar to humans. These electronic skins respond asynchronously to changes (e.g., in pressure, temperature), and can be laid out irregularly on the robot’s body or end-effector. However, these unique features may render current deep learning approaches such as convolutional feature extractors unsuitable for tactile learning. In this paper, we propose a novel spiking graph neural network for event-based tactile object recognition. To make use of local connectivity of taxels, we present several methods for organizing the tactile data in a graph structure. Based on the constructed graphs, we develop a spiking graph convolutional network. The event-driven nature of spiking neural network makes it arguably more suitable for processing the event-based data. Experimental results on two tactile datasets show that the proposed method outperforms other state-of-the-art spiking methods, achieving high accuracies of approximately 90% when classifying a variety of different household objects.
- Research Article
8
- 10.1177/1729881417752820
- Jan 1, 2018
- International Journal of Advanced Robotic Systems
Object recognition is one of the essential issues in computer vision and robotics. Recently, deep learning methods have achieved excellent performance in red-green-blue (RGB) object recognition. However, the introduction of depth information presents a new challenge: How can we exploit this RGB-D data to characterize an object more adequately? In this article, we propose a principal component analysis–canonical correlation analysis network for RGB-D object recognition. In this new method, two stages of cascaded filter layers are constructed and followed by binary hashing and block histograms. In the first layer, the network separately learns principal component analysis filters for RGB and depth. Then, in the second layer, canonical correlation analysis filters are learned jointly using the two modalities. In this way, the different characteristics of the RGB and depth modalities are considered by our network as well as the characteristics of the correlation between the two modalities. Experimental results on the most widely used RGB-D object data set show that the proposed method achieves an accuracy which is comparable to state-of-the-art methods. Moreover, our method has a simpler structure and is efficient even without graphics processing unit acceleration.
- Research Article
14
- 10.1007/s00371-018-1559-x
- May 29, 2018
- The Visual Computer
This paper researches on the problem of object recognition using RGB-D data. Although deep convolutional neural networks have so far made progress in this area, they are still suffering a lot from lack of large-scale manually labeled RGB-D data. Labeling large-scale RGB-D dataset is a time-consuming and boring task. More importantly, such large-scale datasets often exist a long tail, and those hard positive examples of the tail can hardly be recognized. To solve these problems, we propose a multimodal self-augmentation and adversarial network (MSANet) for RGB-D object recognition, which can augment the data effectively at two levels while keeping the annotations. Toward the first level, series of transformations are leveraged to generate class-agnostic examples for each instance, which supports the training of our MSANet. Toward the second level, an adversarial network is proposed to generate class-specific hard positive examples while learning to classify them correctly to further improve the performance of our MSANet. Via the above schemes, the proposed approach wins the best results on several available RGB-D object recognition datasets, e.g., our experimental results indicate a 1.5% accuracy boost on benchmark Washington RGB-D object dataset compared with the current state of the art.
- Video Transcripts
- 10.48448/r2q7-9y39
- Dec 29, 2020
Accurate real-time object recognition from sensory data has long been a crucial and challenging task for autonomous driving. Even though deep neural networks (DNNs) have been widely applied in this area, their considerable processing latency, power consumption, as well as computational complexity have been challenging issues for real-time autonomous driving applications. In this paper, we propose an approach to address the real-time object recognition problem utilizing spiking neural networks (SNNs). The proposed SNN model works directly with raw LiDAR temporal pulses without the pulse-to-point cloud preprocessing procedure, which can significantly reduce delay and power consumption. Being evaluated on various datasets derived from LiDAR and dynamic vision sensor (DVS), including Sim LiDAR, KITTI, and DVS-barrel, our proposed model has shown remarkable time and power efficiency, while achieving comparable recognition performance as the state-of-the-art methods. This paper highlights the SNN's great potentials in autonomous driving and related applications. To the best of our knowledge, this is the first attempt to use SNN to perform time and energy efficient object recognition directly on LiDAR temporal pulses in the setting of autonomous driving.
- Conference Article
11
- 10.1109/icpr48806.2021.9412302
- Jan 10, 2021
Accurate real-time object recognition from sensory data has long been a crucial and challenging task for autonomous driving. Even though deep neural networks (DNNs) have been widely applied in this area, their considerable processing latency, power consumption, as well as computational complexity have been challenging issues for real-time autonomous driving applications. In this paper, we propose an approach to address the real-time object recognition problem utilizing spiking neural networks (SNNs). The proposed SNN model works directly with raw LiDAR temporal pulses without the pulse-to-point cloud preprocessing procedure, which can significantly reduce delay and power consumption. Being evaluated on various datasets derived from LiDAR and dynamic vision sensor (DVS), including Sim LiDAR, KITTI, and DVS-barrel, our proposed model has shown remarkable time and power efficiency, while achieving comparable recognition performance as the state-of-the-art methods. This paper highlights the SNN's great potentials in autonomous driving and related applications. To the best of our knowledge, this is the first attempt to use SNN to perform time and energy efficient object recognition directly on LiDAR temporal pulses in the setting of autonomous driving.
- Research Article
35
- 10.1038/s42256-023-00650-4
- May 8, 2023
- Nature Machine Intelligence
With recent advances in learning algorithms, recurrent networks of spiking neurons are achieving performance that is competitive with vanilla recurrent neural networks. However, these algorithms are limited to small networks of simple spiking neurons and modest-length temporal sequences, as they impose high memory requirements, have difficulty training complex neuron models and are incompatible with online learning. Here, we show how the recently developed Forward-Propagation Through Time (FPTT) learning combined with novel liquid time-constant spiking neurons resolves these limitations. Applying FPTT to networks of such complex spiking neurons, we demonstrate online learning of exceedingly long sequences while outperforming current online methods and approaching or outperforming offline methods on temporal classification tasks. The efficiency and robustness of FPTT enable us to directly train a deep and performant spiking neural network for joint object localization and recognition, demonstrating the ability to train large-scale dynamic and complex spiking neural network architectures. Memory efficient online training of recurrent spiking neural networks without compromising accuracy is an open challenge in neuromorphic computing. Yin and colleagues demonstrate that training a recurrent neural network consisting of so-called liquid time-constant spiking neurons using an algorithm called Forward-Propagation Through Time allows for online learning and state-of-the-art performance at a reduced computational cost compared with existing approaches.
- Conference Article
1
- 10.1109/icip42928.2021.9506331
- Sep 19, 2021
Previous research always solely utilizes Artificial Neural Networks (ANNs) or Spiking Neural Networks (SNNs) for object recognition. However, evidence in neuroscience suggests that the visual processing in human vision is performed hierarchically in the combination of analog and digital processing. To construct a more human vision-like object recognition system, we propose a general hierarchical ANN-SNN model. We evaluate our model and its variants on two popular datasets to show its effectiveness, robustness, efficiency, and generality. Extensive experiments clearly demonstrate the superiority of our proposed models for robust object recognition.
- Conference Article
17
- 10.1109/ivs.2018.8500469
- Jun 1, 2018
In recent years, data-driven methods have shown great success for extracting information about the infrastructure in urban areas. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. While large datasets have been published regarding cars, for cyclists very few labeled data is available although appearance, point of view, and positioning of even relevant objects differ. Unfortunately, labeling data is costly and requires a huge amount of work.In this paper, we thus address the problem of learning with very few labels. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. We propose a system for object recognition that is trained with only 15 examples per class on average. To achieve this, we combine the advantages of convolutional neural networks and random forests to learn a patch-wise classifier. In the next step, we map the random forest to a neural network and transform the classifier to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, we integrate data of the Global Positioning System (GPS) to localize the predictions on the map. In comparison to Faster R-CNN and other networks for object recognition or algorithms for transfer learning, we considerably reduce the required amount of labeled data. We demonstrate good performance on the recognition of traffic signs for cyclists as well as their localization in maps.
- Research Article
18
- 10.1016/j.imavis.2019.11.006
- Nov 9, 2019
- Image and Vision Computing
View-based weight network for 3D object recognition
- Research Article
37
- 10.1109/access.2019.2941005
- Jan 1, 2019
- IEEE Access
Although recent studies on object recognition using deep neural networks have reported remarkable performance, they have usually assumed that adequate object size and image resolution are available, which may not be guaranteed in real applications. This paper proposes a framework for recognizing objects in very low resolution images through the collaborative learning of two deep neural networks: image enhancement network and object recognition network. The proposed image enhancement network attempts to enhance extremely low resolution images into sharper and more informative images with the use of collaborative learning signals from the object recognition network. The object recognition network with trained weights for high resolution images actively participates in the learning of the image enhancement network. It also utilizes the output from the image enhancement network as augmented learning data to boost its recognition performance on very low resolution objects. Through experiments on various low resolution image benchmark datasets, we verified that the proposed method can improve the image reconstruction and classification performance.
- Single Book
6
- 10.1142/1695
- Jul 1, 1992
Lightness constancy from luminance contrast, J. Skrzypek and D. Gungner bringing the grandmother back into the picture - a memory-based view of object recognition, S. Edelman and T. Poggio internal organization of classifier networks trained by backpropagation, D.F. Michaels system identification with artificial neural networks, E.R. Tisdale and W.J. Karplus mixed finite element based neural networks in visual reconstruction, D. Suter the random Neural network model for texture generation, V. Atalay, et al neural networks for collective translational invariant object recognition, L.W. Chan image recognition and reconstruction using associative magnetic processing, J.M. Goodwin, et al incorporating uncertainty in neural networks, B.R. Kammerer neural networks for the recognition of engraved musical scores, P. Martin and C. Bellissant.
- Research Article
- 10.2298/csis240503020s
- Jan 1, 2025
- Computer Science and Information Systems
Edge computing and edge intelligence have gained significant traction in recent years due to the proliferation of Internet of Things devices, the exponential growth of data generated at the network edge, and the demand for real-time and context-aware applications. Despite its promising potential, the application of artificial intelligence on the edge faces many challenges, such as edge computing resource constraints, heterogeneity of edge devices, scalability issues, security and privacy concerns, etc. The paper addresses the challenges of deploying deep neural networks for edge intelligence and traffic object detection and recognition on a video captured by edge device cameras. The primary aim is to analyze resource consumption and achieve resource-awareness, optimizing computational resources across diverse edge devices within the edge-fog computing continuum while maintaining high object detection and recognition accuracy. To accomplish this goal, a methodology is proposed and implemented that exploits the edge-to-fog paradigm to distribute the inference workload across multiple tiers of the distributed system architecture. The edge-fog related solutions are implemented and evaluated in several use cases on datasets encompassing real-world traffic scenarios and traffic objects? recognition problems, revealing the feasibility of deploying deep neural networks for object recognition on resource-constrained edge devices. The proposed edge-to-fog methodology demonstrates enhancements in recognition accuracy and resource utilization, validating the viability of both edge-only and edge-fog based approaches. Furthermore, experimental results demonstrate the system?s adaptability to dynamic traffic scenarios, ensuring real-time recognition performance even in challenging environments.
- Single Book
3
- 10.1007/3-540-63460-6
- Jan 1, 1997
Computational complexity reduction in eigenspace approaches.- An algorithm for intrinsic dimensionality estimation.- Fully unsupervised clustering using centre-surround receptive fields with applications to colour-segmentation.- Multi-sensor fusion with Bayesian inference.- MORAL - A vision-based object recognition system for autonomous mobile systems.- Real-time pedestrian tracking in natural scenes.- Non-rigid object recognition using principal component analysis and geometric hashing.- Object identification with surface signatures.- Computing projective and permutation invariants of points and lines.- Point projective and permutation invariants.- Computing 3D projective invariants from points and lines.- 2D ? 2D geometric transformation invariant to arbitrary translations, rotations and scales.- Extraction of filled-in data from colour forms.- Improvement of vessel segmentation by elastically compensated patient motion in digital subtraction angiography images.- Three-dimensional quasi-binary image restoration for confocal microscopy and its application to dendritic trees.- Mosaicing of flattened images from straight homogeneous generalized cylinders.- Well-posedness of linear shape-from-shading problem.- Comparing convex shapes using Minkowski addition.- Deformation of discrete object surfaces.- Non-Archimedean normalized fields in texture analysis tasks.- The Radon transform-based analysis of bidirectional structural textures.- Textures and structural defects.- Self-calibration from the absolute conic on the plane at infinity.- A badly calibrated camera in ego-motion estimation - propagation of uncertainty.- 6DOF calibration of a camera with respect to the wrist of a 5-axis machine tool.- Automated camera calibration and 3D egomotion estimation for augmented reality applications.- Optimally rotation-equivariant directional derivative kernels.- A hierarchical filter scheme for efficient corner detection.- Defect detection on leather by oriented singularities.- Uniqueness of 3D affine reconstruction of lines with affine cameras.- Distortions of stereoscopic visual space and quadratic Cremona transformations.- Self-evaluation for active vision by the geometric information criterion.- Discrete-time rigidity-constrained optical flow.- An iterative spectral-spatial Bayesian labeling approach for unsupervised robust change detection on remotely sensed multispectral imagery.- Contrast enhancement of badly illuminated images based on Gibbs distribution and random walk model.- Adaptive non-linear predictor for lossless image compression.- Beyond standard regularization theory.- Fast stereovision by coherence detection.- Stereo matching using M-estimators.- Robust location based partial correlation.- Optimization of stereo disparity estimation using the instantaneous frequency.- Segmentation from motion: Combining Gabor- and Mallat-wavelets to overcome aperture and correspondence problem.- Contour segmentation with recurrent neural networks of pulse-coding neurons.- Multigrid MRF based picture segmentation with cellular neural networks.- Computing stochastic completion fields in linear-time using a resolution pyramid.- A Bayesian network for 3d object recognition in range data.- Improving the shape recognition performance of a model with Gabor filter representation.- Bayesian decision versus voting for image retrieval.- A structured neural network invariant to cyclic shifts and rotations.- Morphological grain operators for binary images.- A parallel 12-subiteration 3D thinning algorithm to extract medial lines.- Architectural image segmentation using digital watersheds.- Morphological iterative closest point algorithm.- Planning multiple views for 3-D object recognition and pose determination.- Fast and reliable object pose estimation from line correspondences.- Statistical 3-D object localization without segmentation using wavelet analysis.- A real-time monocular vision-based 3D mouse system.- Face recognition by elastic bunch graph matching.- A conditional mixture of neural networks for face detection, applied to locating and tracking an individual speaker.- Lipreading using Fourier transform over time.- Phantom faces for face analysis.- A new hardware structure for implementation of soft morphological filters.- A method for anisotropy analysis of 3D images.- Fast line and rectangle detection by clustering and grouping.- 1st and 2nd order recursive operators for adaptive edge detection.- Smoothing noisy images without destroying predefined feature carriers.- Local subspace method for pattern recognition.- Testing the effectiveness of Non-Linear Rectification on gabor energy.- Neural-like thinning processing.- Detection of the objects with given shape on the grey-valued pictures.- Automatic parameter selection for object recognition using a parallel multiobjective genetic algorithm.- Unsupervised texture segmentation using Hermite transform filters.- Decomposition of the Hadamard matrices and fast Hadamard transform.- A characterization of digital disks by discrete moments.- One-step short-length DCT algorithms with data representation in the direct sum of the associative algebras.- Character extraction from scene image using fuzzy entropy and rule-based technique.- Facial image recognition using neural networks and genetic algorithms.- An energy minimisation approach to the registration, matching and recognition of images.- Error-free calculation of the convolution using generalized Mersenne and Fermat transforms over algebraic fields.- A new method of texture binarization.- Parameter optimisation of an image processing system using evolutionary algorithms.- Analysis of learning using segmentation models.- Stereo processing of image data from the Air-Borne CCD-scanner WAAC.- An adaptive method of color road segmentation.- Optical flow detection using a general noise model for gradient constraint.- Algorithmic solution and simulation results for vision-based autonomous mode of a planetary rover.- A framework for feature-based motion recovery in ground plane vehicle navigation.- Terrain reconstruction from multiple views.- Detecting motion independent of the camera movement through a log-polar differential approach.- Coordinate-free camera calibration.- A passive real-time gaze estimation system for human-machine interfaces.- An active vision system for obtaining high resolution depth information.
- Conference Article
2
- 10.1109/ijcnn.2019.8851738
- Jul 1, 2019
In this paper, an active control method for visual object exploration and recognition with an unmanned aerial vehicle is presented. This work uses a convolutional neural network for visual object recognition, where input images are obtained with an unmanned aerial vehicle from multiple objects. The object recognition task is an iterative process actively controlled by a saliency map module, which extracts interesting object regions for exploration. The active control allows the unmanned aerial vehicle to autonomously explore better object regions to improve the recognition accuracy. The iterative exploration task stops when the probability from the convolutional neural network exceeds a decision threshold. The active control is validated with offline and real-time experiments for visual exploration and recognition of five objects. Furthermore, passive exploration is also tested for performance comparison. Experiments show that the unmanned aerial vehicle is capable to autonomously explore interesting object regions. Results also show an improvement in recognition accuracy from 88.14% to 95.66% for passive and active exploration, respectively. Overall, this work offers a framework to allow robots to autonomously decide where to move and look next, to improve the performance during a visual object exploration and recognition task.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.