Abstract

Besides the 3D object supervision, the auxiliary disparity supervision is usually indispensable when training a stereo-based 3D object detector. The disparity supervision is either transformed from LiDAR points or generated from pre-trained models. However, the former suffers from the high cost and over-sensitivity to airborne particles of LiDAR devices, and the latter from the limited cross-dataset transferability of contemporary stereo matching models. To alleviate those problems, we propose a self-supervision framework for stereo-based 3D detection that relies on neither LiDARs nor external models. A Depth-based Self-supervision (DSelf) is proposed to unify the coordinate spaces of self-supervised losses and detection into a 3D space. However, the DSelf supervision is dense compared with the sparse LiDAR points, which introduces redundancy and irrelevancy into the stereo features. A Semantic-Aware Sampler (SASampler) is proposed to address the problems by an unbalanced sampling of foreground and background pixels. Combining our SASampler and DSelf supervision, the resultant detector (named S3D) achieves state-of-the-art detection results without explicit disparity supervisions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.