Abstract

Object counting is a fundamental while challenging computer vision task, as it requires the object appearance information as well as semantic understanding of the object. In this paper, we propose an end-to-end multi-context embedding deep network for object counting(MCENet), which observes the object counting task from the three different perspectives to count the number of vehicles in the traffic video frame, or to estimate the number of the pedestrian in the largely congested scene. The first sub-network of MCENet extracts the potential features for the appearance context and the semantic context from different-level layers. The two different-level features from the first sub-network are transferred into the two parallel and complementary sub-networks, which are used to model the appearance context and semantic context for final counting. And thus the multiple contexts are represented and embedded to assist the counting task. Extensive experimental evaluations are reported in this paper, using up to three different object counting benchmarks, which show the proposed approach achieves a competitive performance in all these heterogeneous scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.