Abstract

Given a 2D image query and a pool of 3D objects, the goal of image-object retrieval is to rank the 3D objects according to how well their content fits the query. Previous methods usually project 2D images and 3D objects into a joint embedding space and minimize the distance metric to complete the retrieval task. Since 2D images and 3D objects come from two different domains with large discrepancy, even when 3D objects and 2D images are mapped to a shared space, the gap in feature distribution remains significant, which always leads to domain misalignment. In this work, we propose a novel image-object retrieval method by leveraging optimal transport theory. Specifically, to tackle the dimensionality gap between 2D images and 3D objects, we first represent a 3D object via a sequence of its 2D projections. We then design a Cross-Domain View Attention module (CDVA) to automatically compute the optimal combination of 3D object projections given a 2D query image. Next, we exploit Weighted Optimal Transport (WOT)-based distance to depict the discrepancy between 2D images and 3D objects, and reduce the discrepancy to achieve instance-level alignment. Through this scheme, the transported 2D images and 3D objects with the same label are enforced to follow similar distributions. Finally, we design an explicit Category Centroid Alignment module (CCA) to achieve class-level alignment to improve the retrieval performance. Extensive experiments show that our method can achieve competitive performance on the MI3DOR and MI3DOR-2 benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call