Abstract
3-D lane detection is a challenging task due to the diversity of lanes, occlusion, dazzle light, and so on. Traditional methods usually use highly specialized handcrafted features and carefully designed postprocessing to detect them. However, these methods are based on strong assumptions and single modal so that they are easily scalable and have poor performance. In this article, a multimodal fusion network (MFNet) is proposed through using multihead nonlocal attention and feature pyramid for 3-D lane detection. It includes three parts: multihead deformable transformation (MDT) module, multidirectional attention feature pyramid fusion (MA-FPF) module, and top-view lane prediction (TLP) ones. First, MDT is presented to learn and mine multimodal features from RGB images, depth maps, and point cloud data (PCD) for achieving optimal lane feature extraction. Then, MA-FPF is designed to fuse multiscale features for presenting the vanish of lane features as the network deepens. Finally, TLP is developed to estimate 3-D lanes and predict their position. Experimental results on the 3-D lane synthetic and ONCE-3DLanes datasets demonstrate that the performance of the proposed MFNet outperforms the state-of-the-art methods in both qualitative and quantitative analyses and visual comparisons.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on neural networks and learning systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.