Abstract
Object recognition in depth images is challenging and persistent task in machine vision, robotics, and automation of sustainability. Object recognition tasks are a challenging part of various multimedia technologies for video surveillance, human–computer interaction, robotic navigation, drone targeting, tourist guidance, and medical diagnostics. However, the symmetry that exists in real-world objects plays a significant role in perception and recognition of objects in both humans and machines. With advances in depth sensor technology, numerous researchers have recently proposed RGB-D object recognition techniques. In this paper, we introduce a sustainable object recognition framework that is consistent despite any change in the environment, and can recognize and analyze RGB-D objects in complex indoor scenarios. Firstly, after acquiring a depth image, the point cloud and the depth maps are extracted to obtain the planes. Then, the plane fitting model and the proposed modified maximum likelihood estimation sampling consensus (MMLESAC) are applied as a segmentation process. Then, depth kernel descriptors (DKDES) over segmented objects are computed for single and multiple object scenarios separately. These DKDES are subsequently carried forward to isometric mapping (IsoMap) for feature space reduction. Finally, the reduced feature vector is forwarded to a kernel sliding perceptron (KSP) for the recognition of objects. Three datasets are used to evaluate four different experiments by employing a cross-validation scheme to validate the proposed model. The experimental results over RGB-D object, RGB-D scene, and NYUDv1 datasets demonstrate overall accuracies of 92.2%, 88.5%, and 90.5% respectively. These results outperform existing state-of-the-art methods and verify the suitability of the method.
Highlights
Human beings are capable of perceiving and recognizing multiple objects in complex scenarios via biological vision
It combines histogram of oriented gradients (HOG) with an oriented response anisotropic derivative half Gaussian kernel. They ascertained the improved efficiency over scale-invariant feature transform (SIFT), gradient location and orientation histogram (GLOH), and DAISY descriptors
They fused separately processed RGB and depth images through a canonical correlation analysis (CCA) layer and a combining layer was introduced to the multi-view convolutional neural network (CNN)
Summary
Human beings are capable of perceiving and recognizing multiple objects in complex scenarios via biological vision. Numerous methods perform relatively well at classifying only prominent objects in a complete scene; the results are not adequate when multiple objects need to be recognized in a single dynamic scenario. In these methods, different features of objects, such as global and local features, are used to recognize objects in the scene. The pre-processed images are converted to point clouds and depth maps to extract planes for efficient segmentation using modified maximum likelihood estimation sampling consensus (MMLESAC) in the second step. The reduced DKDES set is provided to a KSP for sustainable object recognition as a final step. To recognize single and multiple objects in an image, a collective set of descriptors named depth kernel descriptors (DKDES) is applied to three benchmark datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.