A heterogeneous double ensemble algorithm for soybean planting area extraction in Google Earth Engine

Shuo Wang,Wei Feng,Yinghui Quan,Qiang Li,Gabriel Dauphin,Wenjiang Huang,Jing Li,Mengdao Xing

doi:10.1016/j.compag.2022.106955

Abstract

Soybeans are one of the main crops grown in the United States. It is crucial to grasp the distribution of soybean cultivation areas for ensuring food security, eradicating hunger and adjusting crop structures. However, the traditional method of extracting soybean planting areas drains on manpower and material resources and takes a long time. The emergence of high-resolution images, such as Sentinel-2A(S2A), enables the identification of soybean at the field scale, and these images can be applied on a large scale with the support of cloud computing technology. This work proposes a heterogeneous double ensemble algorithm to extract soybean planting area. The crop type dataset from the U.S. Department of Agriculture and S2A dataset are applied in this study. Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) calculated from S2A data are used to improve the classification accuracy. The proposed method consists of the following steps. Firstly, the S2A data is processed according to phenological information and spectra characteristics. Secondly, the texture features obtained by the grayscale matrix are integrated with spectral features. Thirdly, in order to remove useless features and improve the classification efficiency, only important bands are retained for the next step through feature importance analysis. Fourthly, Random Forest (RF), Classification And Regression Tree (CART), and Support Vector Machines (SVM) serve as base classifiers to train the above-mentioned features. Finally, result maps are obtained by “voting” on three classification results. In this study, three research areas, Guthrie in Iowa, Clinton in Indiana, and Cuming in Nebraska are utilized to validate the effectiveness of the proposed method. Numerical simulations show the increased performance of classification when using these propositions. When compared with the reference methods, the average increase of the overall accuracy obtained by the proposed algorithm is 1.4%, 3.2%, and 1.7% on the Guthrie data, Clinton data, and Cuming data respectively.

Full Text