Abstract

Studies on many dense correspondence tasks in the field of computer vision attempt to find spatially smooth results. A typical way to solve these problems is by smoothing the matching costs using edge-preserving filters. However, local filters generate locally optimal results, in that they only take the costs over a small support window into account, and non-local filters based on a minimum spanning tree (MST) tend to overuse the piece-wise constant assumption. In this paper, we propose a linear time non-local cost aggregation method based on two complementary spatial tree structures. The geodesic distances in both the spatial and intensity spaces along the tree structures are used to evaluate the similarity of pixels, and the final aggregated cost is the sum of the outputs from these two trees. The filtering output of a pixel on each tree can be obtained by recursively aggregating the costs along eight sub-trees with linear time complexity. The only difference between the filtering procedures on these two spatial tree structures is the order of the filtering. Experimental results in optical flow estimation and stereo matching on the Middlebury and KITTI datasets demonstrate the effectiveness and efficiency of our method. It turns out that our method outperforms typical non-local filters based on the MST in cost aggregation. Moreover, a comparison of handcrafted features and deep features learned by convolutional neural networks (CNNs) in calculating the matching cost is also provided. The code will be available soon.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call