Abstract

The number of trees and their spatial distribution are key information for forest management. In recent years, deep learning-based approaches have been proposed and shown promising results in lowering the expensive labor cost of a forest inventory. In this paper, we propose a new efficient deep learning model called density transformer or DENT for automatic tree counting from aerial images. The architecture of DENT contains a multi-receptive field convolutional neural network to extract visual feature representation from local patches and their wide context, a transformer encoder to transfer contextual information across correlated positions, a density map generator to generate spatial distribution map of trees, and a fast tree counter to estimate the number of trees in each input image. We compare DENT with a variety of state-of-art methods, including one-stage and two-stage, anchor-based and anchor-free deep neural detectors, and different types of fully convolutional regressors for density estimation. The methods are evaluated on a new large dataset we built and an existing cross-site dataset. DENT achieves top accuracy on both datasets, significantly outperforming most of the other methods. We have released our new dataset, called Yosemite Tree Dataset, containing a 10 km2 rectangular study area with around 100k trees annotated, as a benchmark for public access.

Highlights

  • Transformer for Tree Counting inThe density and distribution of forest trees are important information for ecologists to understand the ecosystem in certain regions

  • We propose a new method for tree counting called density transformer or DENT, which consists of a multi-receptive field (Multi-RF) convolutional neural network (CNN), a transformer, and two heads: Density Map Generator (DMG) and tree counter

  • We choose a rectangular study area, centered at Latitude 37.854, Longitude −119.548, in the Yosemite National Park and build a benchmark dataset for tree counting based on RGB aerial images. (Figure 6) The images are collected via Google Maps at 11.8 cm ground sampling distance (GSD) and stitched together

Read more

Summary

Introduction

The density and distribution of forest trees are important information for ecologists to understand the ecosystem in certain regions. One approach of object counting using DNNs is detection-based, i.e., to localize each individual object of interest first and get the total number This is the mainstream of the published tree counting methods [12,13,14,15,16,17,18]. We propose a new method for tree counting called density transformer or DENT, which consists of a multi-receptive field (Multi-RF) convolutional neural network (CNN), a transformer, and two heads: Density Map Generator (DMG) and tree counter. The first part is the novel endto-end approach for tree counting, using an efficient multi-receptive field CNN architecture for visual feature representation, a transformer for modeling the pair-wise interaction between the visual features, and two heads for outputs at different granularity and time costs. The second part is the new Yosemite Tree Dataset as a common benchmark for tree counting

Transformers
Density Estimation
Object Detection
Multi-Receptive Field Network
Transformer Encoder
Tree Counter
Yosemite Tree Dataset
NeonTreeEvaluation Dataset
Evaluation Metric
Comparison to State-of-Art Methods
Technical Details
Ablation Study
Inference Time
Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.