This paper introduces an efficient representation and feature extraction technique for 3D pose estimation of objects, incorporating a novel mechanism for the exploitation of the extracted visual cues. A combination of a fuzzy clustering technique for the input space, with supervised learning, results in a problem of reduced dimensionality and an efficient mapping of the input–output space. While other neural network-based approaches for 3D pose estimation focus on reducing dimensionality based on input space characteristics, such as with PCA-based approaches, the proposed scheme directly targets the input–output mapping, based on the available visual data. Evaluation results provide evidence of low generalization error when estimating the 3D pose of objects, with the best performance achieved when employing Radial Basis Functions. The proposed system can be adopted in several computer vision applications requiring object localization, pose estimation and target tracking.
Read full abstract