Visual trackers can be used to determine the trajectories of rigid 3D objects with 6-DOF. This functionality is required for many types of applications, such as augmented reality or robotics, and its efficiency is commonly related to the accuracy, robustness and execution time of these trackers. Several works contributed to the improvement of these techniques, among which stand out optimization-based, learning-based, and hybrid methods, that is, that use both cooperatively. However, despite the variety of 3D tracking object techniques, in general they have difficulty to perform tracking efficiently in challenging environments as those with occlusion or quick rotation and translation movements of the object. In this context, we propose improvements in the 6-DOF tracking of arbitrary 3D objects that uses particle swarm optimization and try to overcome those limitations. In this type of tracker, a top-down approach is used in which, for each video frame, a pose hypothesis set is created and optimized in order to find the pose as close as possible to the tracked object real pose. To perform this task, a fitness function uses 3D scene information obtained from RGB-D sensors to evaluate each hypothesis. The proposed approach tries to solve problems that in practice reduce the efficiency of this optimization, and are associated with the dynamism of the search space, the definition of the limits of the search subspace at each iteration, the exclusion of global optimum of search subspace, the premature convergence to local optima, the different types of object occlusion, the definition of a fitness function independent of the tracked object class and environment characteristics, and the execution time for processing all steps of the particle swarm optimization. Therefore, in this scenario we present a tracker algorithm with a novel fitness function based on the harmonic mean of 3D coordinate, color and normal vector distances. We also introduce a new approach for computing the boundaries of solutions subspace at runtime, observing the inertia of the target object. We present a robust method for filtering the visible model points. Finally, in order to improve execution time, we employ a dynamic region of interest selection in the scene point cloud and implement the particle swarm optimization algorithm completely in GPU. Experiments have shown that such changes led to improvements in accuracy, robustness and execution time. When compared to a state-of-the-art learning-based technique, our tracker was, on average, 19.3% and 16.3% more accurate with respect to translation and rotation errors, respectively, and presented 43.2% less tracking fails. Our tracker was also 5–17 times faster than an existing particle swarm optimization-based technique.
Read full abstract