The spatial and temporal coverage of spaceborne optical imaging systems are well suited for automated marine litter monitoring. However, developing machine learning-based detection and identification algorithms requires large amounts of data. Indeed, when it comes to marine debris, ground validated data is scarce. In this study, we propose a general methodology that leverages synthetic data in order to avoid overfitting and generalizes well. The idea is to utilize realistic models of spaceborne optical image acquisition and marine litter to generate large amounts of data to train the machine learning algorithms. These can then be used to detect marine pollution automatically on real satellite images. The main contribution of our study is showing that algorithms trained on simulated data can be successfully transferred to real-life situations. We present the general components of our framework, our modeling of satellites and marine debris and a proof of concept implementation for macro-plastic detection with Sentinel-2 images. In this case study, we generated a large dataset (more than 16,000 pixels of marine debris) composed of seawater, plastic, and wood and trained a Random Forest classifier on it. This classifier, when tested on real satellite images, successfully discriminates marine litter from seawater, thus proving the effectiveness of our approach and paving the way for machine learning-based marine litter detection with even more representative simulation models.
Read full abstract