Abstract

Traditional stereo dense image matching (DIM) methods normally predefine a fixed window to compute matching cost, while their performances are limited by the matching window sizes. A large matching window usually achieves robust matching results in weak-textured regions, while it may cause over-smoothness problems in disparity jumps and fine structures. A small window can recover sharp boundaries and fine structures, while it contains high matching uncertainties in weak-textured regions. To address the issue above, we respectively compute matching results with different matching window sizes and then proposes an adaptive fusion method of these matching results so that a better matching result can be generated. The core algorithm designs a Convolutional Neural Network (CNN) to predict the probabilities of large and small windows for each pixel and then refines these probabilities by imposing a global energy function. A compromised solution of the global energy function is utilized by breaking the optimization into sub-optimizations of each pixel in one-dimensional (1D) paths. Finally, the matching results of large and small windows are fused by taking the refined probabilities as weights for more accurate matching. We test our method on aerial image datasets, satellite image datasets, and Middlebury benchmark with different matching cost metrics. Experiments show that our proposed adaptive fusion of multiple-window matching results method has a good transferability across different datasets and outperforms the small windows, the median windows, the large windows, and some state-of-the-art matching window selection methods.

Highlights

  • The goal of stereo dense image matching (DIM) is to find pixel-wise correspondences between stereo image pairs, which has been attracting increased attention in photogrammetry and computer vision communities for decades [1,2]

  • We tested our proposed method on the above three datasets with three different matching cost metrics, Census [17], ZNCC [35], and MC-Convolutional Neural Network (CNN)-fst [22], and compared it with the matching results of a 5 × 5 pixels matching window, 15 × 15 pixels matching window, 9 × 9 pixels matching window as well as a recent texture-based window selection method [25], which adaptively selected appropriate window sizes according to local intensity variations, a matching confidence-based method which selects matching windows with the least matching uncertainties [31], and our previous window size selection network (WSSN) [32], which extracts both image texture features and disparity features by convolutional neural network and utilizes the fully connected layers to conduct optimal window size selection

  • The matching results of all methods have been optimized by Semi-Global Matching (SGM) and several post-processing steps (e.g., Winner-Takes-All (WTA), Left-Right Consistency (LRC), disparity interpolation) with the same matching parameters

Read more

Summary

Introduction

The goal of stereo dense image matching (DIM) is to find pixel-wise correspondences between stereo image pairs, which has been attracting increased attention in photogrammetry and computer vision communities for decades [1,2]. The image pairs are generally rectified in the epipolar image space such that correspondences are in the same row between the pairs with only differences in the column coordinates, termed disparity or parallax. Traditional DIM methods search correspondences by comparing the similarities of their appearances (e.g., intensities, textures). Most DIM methods [10] predefine a fixed window to describe the appearance features of correspondences, and compare the appearance similarities by measuring distances of these features, termed as matching cost. 2p02r0o, 1p2,oxsFeOdR PiEnERthREeVIlEaWst decades, and the difference among them main2lyof 2f0ocused on the various appearance feature descriptors, e.g., image intensities, image gradients, and intensity rankings. ImaTgraediitniotneanlsDitIyM-bamseethdodms astecahrcihngcocrroesstpomndeetnrcicess [b1y,1c1o,m12p]arainsgsuthme esibmriilgarhittinesesosf cthoenirstancy for correspondathpeepnaecpaerpasenacareansnd(cee.gcf.eo,aimtnutrepenussiotteifecsmo, rteraextstcuphroenisnd).geMnccoeossts, DtanIbMdycmocmeothpmoardpesat[h1re0in]apgprpeiednaertfaeinnnecseaistfiiimxeesidlaiwrnitiinemdsobawytctmhoeidanesgsucrriwinbgeindows of corresponddeisntacnecse.s Sofutchhesme feeathtuoredss, aalsroe taelrwmeadysasemffiactciheinntg aconsdt. sVtarraioigushtwfoinrdwowar-bda,sebdumt saetcnhsinitgivmeettroicsnoises and image radhioavme ebtereinc dpriosptosretdioinnst.heImlaastgdeegcardaeds,ieandt-bthaeseddiffemreantccehaimnogncgotshtemmemtariinclsy cfocmuspedutoengtrhaedients for each pixelv, aarnioduseiatphpeeraruanscee thfeeatsueregrdaedscireipnttosrst,hee.gm., siemlvagees ionrteunssietietsh, eimdaigsetrgibraudtiieonntss, oanfdthinetseensgitryadients as feature desrccaonrrrikepisntpgoosrn.sdIm[e1na3cge–es1i6ann]t.ednSscuiotmyc-hpbaumsteeedtmhmaotacdthcshinicngagncoccsotostbmmypecetornmicsspaa[tr1ei,n1f1go,1ri2nl]tienansessiautirmesreaibndriimoghmattncehetsirnsigccowdniisnsttadonorcwytisofoonrfs between correspondcoernrecsepsonadnednceasc. hSuiechvemertohboudsstarme aalwtcahyisnegffircieesnut altnsdisntrateigxhttuforrewdarrde,gbuiot nsesn.siItnivteetnosniotyisersaannkding based matching ciomsatg[e1r7a–d2io0m] ertarinckdsisitnotretinonssi.tiIemsawgeitghriandiaenmt-baatscehdinmgatwchingdocowst bmyetcroicms cpoamrpinutge tghraedsieenintstefonrsities with the centraleapcihxpeilxaeln, dancdoenitshiedreursse tthheeseragnrakdiinengtsrethseumltsselvaesstohreufseeathtue rdeisdtreibsuctrioipnstoorfst.heAsemgorandgienstuscahs methods, Census [17fce,o2art1rue]rsempodaneydscerbnipceetosthrasen[d1m3a–oc1h6si]te. vcSeoucmrhobmmueostthnmolydastucchasinendgcormmespauetlntcsshaiintnetgfeoxrctuloirnseetdaarrnreagddioihonmas.seItnbrtieceendnsiisttpyorrrtoaionvnkesinnbgettobwabeseeednone of the most robusmtawtchhienng ccoosmt [1p7a–r2e0d] rwaniktshinottehnesirtitersawdiitthiionnaaml matcehtihnogdwsin[d1o].w by comparing these intensities

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.