Abstract

Facade image parsing is essential to the semantic understanding and 3-D reconstruction of urban scenes. Considering the occlusion and appearance ambiguity in single-view images and the easy acquisition of multiple views, in this letter, we propose a multiview enhanced deep architecture for facade parsing. The highlight of this architecture is a cross-view feature aggregation module that can learn to choose and fuse useful convolutional neural network (CNN) features from nearby views to enhance the representation of a target view. Benefitting from the multiview enhanced representation, the proposed architecture can better deal with the ambiguity and occlusion issues. Moreover, our cross-view feature aggregation module can be straightforwardly integrated into existing single-image parsing frameworks. Extensive comparison experiments and ablation studies are conducted to demonstrate the good performance of the proposed method and the validity and transportability of the cross-view feature aggregation module.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call