Satellite time series data, widely used for land cover classification, often contain missing values due to cloud contamination, which can negatively affect classification. Numerous strategies have been developed to reconstruct the missing values to produce regular time series for machine learning classifiers, among which the compositing followed by the linear interpolation is most widely used. However, the classification improvement of linear interpolation for land cover classification has not been examined. Recently developed deep learning models such as long short term memory (LSTM) and Transformer allow such examination as they can classify time series with missing values. In this study, we compared the time series composites with missing values (without linear interpolation) and the linearly interpolated time series composites (without missing values) for land cover classification. About 18 thousand Harmonized Landsat Sentinel-2 (HLS) images acquired over Amur River Basin of China (890,308 km2) in 2021 were composited to 14 16-day periods. Two time series composites were classified, i.e., (i) the 16-day composites without interpolation that have on average 15.35% 16-day periods with missing values and (ii) the linearly interpolated 16-day composites with no missing values. The classifications showed that (1) between classifications with and without linear interpolation there was < 0.2% overall accuracy differences for the bidirectional LSTM (Bi-LSTM) and < 0.5% for the Transformer both of which were smaller than model training randomness; and (2) the computation time can be saved using composites without linear interpolation. The findings suggested that it is unnecessary to use the time-consuming linear interpolation in Bi-LSTM and Transformer-based land cover classifications. The findings were confirmed by experiments for sensitivity to the number of cloud-free composites and to different classification legends using crop type classifications. It implied the linear interpolation algorithm cannot reconstruct reliable time series for land cover classifications and historical use of such method is more about mitigating the inability of traditional classifiers to handle missing values rather than improving classifications. Linear interpolation is not necessary for LSTM and Transformer with capability to handle missing values. The training datasets and developed codes in this study are made publicly available.
Read full abstract