Abstract
In this paper, we formalize memory- and tracking-based methods to perform the LiDAR-based Video Object Segmentation (VOS) task, which segments points of the specific 3D target (given in the first frame) in a LiDAR sequence. LiDAR-based VOS can directly provide target-aware geometric information for practical application scenarios like behavior analysis and anticipating danger. We first construct a LiDAR-based VOS dataset named KITTI-VOS based on SemanticKITTI, which acts as a testbed and facilitates comprehensive evaluations of algorithm performance. Next, we provide two types of baselines, i.e., memory-based and tracking-based baselines, to explore this task. Specifically, the first memory-based pipeline is built on a space–time memory network equipped with the non-local spatiotemporal attention-based memory bank. We further design a more potent variant to introduce the locality into the spatiotemporal attention module by local self-attention and cross-attention modules. For the second tracking-based baseline, we modify two representative 3D object tracking methods to adapt to LiDAR-based VOS tasks. Finally, we propose a refine module that takes mask priors and generates object-aware kernels, which could boost all the baselines’ performance. We evaluate the proposed methods on the dataset and demonstrate their effectiveness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.