Abstract

As various sensors appear, extracting information from multimodal data becomes a prominent topic. Current multimodal approaches for image and LiDAR normally discard the point-to-point topology relationship of the latter to keep the dimension matched. To tackle this task, we propose a cascaded cross-modal network (CCMN) to extract the joint-features from high-resolution aerial imagery and LiDAR point directly, instead of their abridged derivatives. Firstly, point-wise features are extract from raw LiDAR data by a forepart 3D extractor. Subsequently, the LiDAR-derived features are executed spatial reference conversion to project and align to the imagery coordinate space. Finally, the cross-modal compounds containing the obtained feature maps and the corresponding images are placed into a U-shape structure to generate segmentation results. The experiment results indicate that our strategy surpasses the popular multimodal method by 6% on mIoU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call