Abstract
The number of trees and their spatial distribution are key information for forest management. In recent years, deep learning-based approaches have been proposed and shown promising results in lowering the expensive labor cost of a forest inventory. In this paper, we propose a new efficient deep learning model called density transformer or DENT for automatic tree counting from aerial images. The architecture of DENT contains a multi-receptive field convolutional neural network to extract visual feature representation from local patches and their wide context, a transformer encoder to transfer contextual information across correlated positions, a density map generator to generate spatial distribution map of trees, and a fast tree counter to estimate the number of trees in each input image. We compare DENT with a variety of state-of-art methods, including one-stage and two-stage, anchor-based and anchor-free deep neural detectors, and different types of fully convolutional regressors for density estimation. The methods are evaluated on a new large dataset we built and an existing cross-site dataset. DENT achieves top accuracy on both datasets, significantly outperforming most of the other methods. We have released our new dataset, called Yosemite Tree Dataset, containing a 10 km2 rectangular study area with around 100k trees annotated, as a benchmark for public access.
Highlights
Transformer for Tree Counting inThe density and distribution of forest trees are important information for ecologists to understand the ecosystem in certain regions
We propose a new method for tree counting called density transformer or DENT, which consists of a multi-receptive field (Multi-RF) convolutional neural network (CNN), a transformer, and two heads: Density Map Generator (DMG) and tree counter
We choose a rectangular study area, centered at Latitude 37.854, Longitude −119.548, in the Yosemite National Park and build a benchmark dataset for tree counting based on RGB aerial images. (Figure 6) The images are collected via Google Maps at 11.8 cm ground sampling distance (GSD) and stitched together
Summary
The density and distribution of forest trees are important information for ecologists to understand the ecosystem in certain regions. One approach of object counting using DNNs is detection-based, i.e., to localize each individual object of interest first and get the total number This is the mainstream of the published tree counting methods [12,13,14,15,16,17,18]. We propose a new method for tree counting called density transformer or DENT, which consists of a multi-receptive field (Multi-RF) convolutional neural network (CNN), a transformer, and two heads: Density Map Generator (DMG) and tree counter. The first part is the novel endto-end approach for tree counting, using an efficient multi-receptive field CNN architecture for visual feature representation, a transformer for modeling the pair-wise interaction between the visual features, and two heads for outputs at different granularity and time costs. The second part is the new Yosemite Tree Dataset as a common benchmark for tree counting
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.