Abstract

This paper proposes a novel Hierarchical Parsing Net (HPN) for semantic scene parsing. Unlike previous methods, which separately classify each object, HPN leverages global scene semantic information and the context among multiple objects to enhance scene parsing. On the one hand, HPN uses the global scene category to constrain the semantic consistency between the scene and each object. On the other hand, the context among all objects is also modeled to avoid incompatible object predictions. Specifically, HPN consists of four steps. In the first step, we extract scene and local appearance features. Based on these appearance features, the second step is to encode a contextual feature for each object, which models both the scene-object context (the context between the scene and each object) and the interobject context (the context among different objects). In the third step, we classify the global scene and then use the scene classification loss and a backpropagation algorithm to constrain the scene feature encoding. In the fourth step, a label map for scene parsing is generated from the local appearance and contextual features. Our model outperforms many state-of-the-art deep scene parsing networks on five scene parsing databases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call