Abstract

Current artificial neural networks mainly conduct the learning process in the spatial domain but neglect the frequency domain learning. However, the learning course performed in the frequency domain can be more efficient than that in the spatial domain. In this paper, we fully explore frequency domain learning and propose a joint learning paradigm of frequency and spatial domains. This paradigm can take full advantage of the combined preponderances of frequency learning and spatial learning; specifically, frequency and spatial domain learning can effectively capture intrinsic global and local information, respectively. To achieve this, an innovative but effective linear learning block is proposed to conduct the learning process directly in the frequency domain. Together with the prevailing spatial learning operation, i.e., convolution, a powerful and scalable joint learning framework is further proposed. Exhaustive experiments on the diverse Benchmark datasets — KITTI, Make3D, and Cityscapes demonstrate the effectiveness and superiority of the proposed joint learning paradigm in dense image prediction tasks, including self-supervised depth estimation, ego-motion estimation, and semantic segmentation. In particular, the proposed model can achieve performance competitive to those of state-of-the-art methods in all three tasks, even without pretraining. Moreover, the proposed model reduces the number of parameters by over 78% for self-supervised depth estimation on the KITTI dataset while retaining the time complexity on par with other state-of-the-art methods; this provides a great chance to develop real-world applications. We hope that the proposed method can encourage more research in cross-domain learning. The codes are publicly available at https://github.com/shaochengJia/FSLNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call