Crop Type Mapping from Optical and Radar Time Series Using Attention-Based Deep Learning

Stella Ofori-Ampofo,Charlotte Pelletier,Stefan Lang

doi:10.3390/rs13224668

Abstract

Crop maps are key inputs for crop inventory production and yield estimation and can inform the implementation of effective farm management practices. Producing these maps at detailed scales requires exhaustive field surveys that can be laborious, time-consuming, and expensive to replicate. With a growing archive of remote sensing data, there are enormous opportunities to exploit dense satellite image time series (SITS), temporal sequences of images over the same area. Generally, crop type mapping relies on single-sensor inputs and is solved with the help of traditional learning algorithms such as random forests or support vector machines. Nowadays, deep learning techniques have brought significant improvements by leveraging information in both spatial and temporal dimensions, which are relevant in crop studies. The concurrent availability of Sentinel-1 (synthetic aperture radar) and Sentinel-2 (optical) data offers a great opportunity to utilize them jointly; however, optimizing their synergy has been understudied with deep learning techniques. In this work, we analyze and compare three fusion strategies (input, layer, and decision levels) to identify the best strategy that optimizes optical-radar classification performance. They are applied to a recent architecture, notably, the pixel-set encoder–temporal attention encoder (PSE-TAE) developed specifically for object-based classification of SITS and based on self-attention mechanisms. Experiments are carried out in Brittany, in the northwest of France, with Sentinel-1 and Sentinel-2 time series. Input and layer-level fusion competitively achieved the best overall F-score surpassing decision-level fusion by 2%. On a per-class basis, decision-level fusion increased the accuracy of dominant classes, whereas layer-level fusion improves up to 13% for minority classes. Against single-sensor baseline, multi-sensor fusion strategies identified crop types more accurately: for example, input-level outperformed Sentinel-2 and Sentinel-1 by 3% and 9% in F-score, respectively. We have also conducted experiments that showed the importance of fusion for early time series classification and under high cloud cover condition.

Highlights

Causal factors such as climate change have a high likelihood to threaten food security at global, regional, and local levels [1]
Our study extends these research works by investigating more forms of fusion along with an advanced deep learning architecture, namely pixel set encoder–temporal attention encoder (PSE-temporal encoder (TAE)), and proposes the optimal level of synergy in this setup between Sentinel-1 and Sentinel-2 for crop classification
And Pixel set encoder (PSE) fusions use a nearest-neighbor interpolation, whereas the late fusion strategy uses the average of class probabilities

Summary

Introduction

Causal factors such as climate change have a high likelihood to threaten food security at global, regional, and local levels [1]. Recent reports reveal that agriculture absorbs 26% of the economic impact of climate-induced disasters, which rises to more than 80% for drought in developing countries [2]. The agricultural sector is impacted by changing climates but contributes about 24% of greenhouse gas (GHG) emissions together with forestry and other land use [3]. Warmer temperatures and carbon dioxide presence can stimulate crop growth [4], especially in temperate regions. May have dire consequences on crop productivity [5]. Remote sensing has become an integral tool supporting the monitoring and management of agriculture as well as efforts to mitigate climate change

Methods

Results

Conclusion