Abstract
Recent deep-learning counting techniques revolve around two distinct features of data—sparse data, which favors detection networks, or dense data where density map networks are used. Both techniques fail to address a third scenario, where dense objects are sparsely located. Raw aerial images represent sparse distributions of data in most situations. To address this issue, we propose a novel and exceedingly portable end-to-end model, DisCountNet, and an example dataset to test it on. DisCountNet is a two-stage network that uses theories from both detection and heat-map networks to provide a simple yet powerful design. The first stage, DiscNet, operates on the theory of coarse detection, but does so by converting a rich and high-resolution image into a sparse representation where only important information is encoded. Following this, CountNet operates on the dense regions of the sparse matrix to generate a density map, which provides fine locations and count predictions on densities of objects. Comparing the proposed network to current state-of-the-art networks, we find that we can maintain competitive performance while using a fraction of the computational complexity, resulting in a real-time solution.
Highlights
Counting objects is a fine-grain scene-understanding problem which can arise in many real-world applications including counting people in crowded scenes and surveillance scenarios [1,2,3,4,5], counting vehicles [6], counting cells for cancer detection [7], and counting in agriculture settings for yield estimation and land use [8,9]
We propose a novel technique influenced by both detection and density map networks along with specialized training techniques in which coarse and fine detection occurs
Our approach is different from previous work as we have developed a fully automatic technique where the region of interest are selected automatically in the first part of the network ( DiscNet) without any manual cropping of imagery and counting is performed automatically in an end-to-end learning procedure on optical imagery
Summary
Counting objects is a fine-grain scene-understanding problem which can arise in many real-world applications including counting people in crowded scenes and surveillance scenarios [1,2,3,4,5], counting vehicles [6], counting cells for cancer detection [7], and counting in agriculture settings for yield estimation and land use [8,9]. Counting questions appear as some of the most difficult and challenging questions in Visual Question Answering (VQA). Despite very promising results in “yes/no” and “what/where/who/when” questions, counting questions (how many) are the most difficult questions for the system, which have the lowest performance [10,11]. The emergence of micro Unmanned Aerial Vehicles (UAVs), featuring high flexibility, low cost, and high maneuverability has brought the opportunity to build effective management systems. They can access and survey large areas of land for data collection and translate this data into a user-friendly information source for managers
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have