Abstract

The current state of 3D zero-shot recognition falls short in performance when compared to its counterpart in 2D images. A major challenge is the absence of a robust feature extractor for 3D point cloud data. To overcome this challenge, a large dataset needs to be collected, processed, and fed into a deep-learning model capable of generating distinguishable features after training. To this end, we propose VAE-GAN3D, a model that uses a combination of a variational autoencoder (VAE) and a generative adversarial network (GAN) to supplement the small amount of data available for training with synthetic examples of classes. This allows for the underlying data distribution of large and complex data to be learned by the deep VAE network. When combined with a GAN, this network generates synthetic features that exhibit consistency in both seen and unseen classes. Furthermore, we notice that for tasks involving a small domain of classes, the existing text features do not contribute significantly to zero-shot learning. Therefore, we introduce image representation-based semantic features of classes, which improve the performance of zero-shot recognition for 3D objects. We assess the proposed model using three different datasets and present a technique for partitioning the RGB-D object dataset, which contains real-world objects, into seen and unseen classes. Our approach shows promising results in addressing the challenges of 3D zero-shot recognition and presents a novel solution for improving the accuracy of 3D point cloud recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call