Abstract
In this paper, we focus on devising a versatile framework for dense pixelwise prediction whose goal is to assign a discrete or continuous label to each pixel for an image. It is well-known that the reduced feature resolution due to repeated subsampling operations poses a serious challenge to Fully Convolutional Network (FCN) based models. In contrast to the commonly-used strategies, such as dilated convolution and encoder-decoder structure, we introduce the Flattening Module to produce high-resolution predictions without either removing any subsampling operations or building a complicated decoder module. In addition, the Flattening Module is lightweight and can be easily combined with any existing FCNs, allowing the model builder to trade off among model size, computational cost and accuracy by simply choosing different backbone networks. We empirically demonstrate the effectiveness of the proposed Flattening Module through competitive results in human pose estimation on MPII, semantic segmentation on PASCAL-Context and object detection on PASCAL VOC. We hope that the proposed approach can serve as a simple and strong alternative of current dominant dense pixelwise prediction frameworks.
Highlights
Many fundamental computer vision tasks can be formulated as a dense pixelwise prediction problem
We introduce a novel scheme to produce dense pixelwise predictions based on the proposed lightweight Flattening Module while avoiding either removing any subsampling operations or building a complex decoder module
FLATTENET we firstly present a general framework for addressing the dense pixelwise prediction problem, from which our specific instantiation is derived, and introduce the Flattening Module
Summary
Many fundamental computer vision tasks can be formulated as a dense pixelwise prediction problem. There is a growing interest in reducing anchor boxes based object detection to a pixelwise prediction problem [12]–[15]. Deep learning methods, and in particular deep convolutional neural networks (DCNNs) based on the Fully Convolutional Network (FCN) framework [16], have achieved tremendous success in such dense pixelwise prediction tasks. It is well-known that the major issue for current FCN based models is the reduced feature
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.