The Landsat archive is one of the richest Earth observation datasets available and provides long-term data at fairly high temporal and spatial resolution globally. Temporal aggregation is frequently used to condense single observations into a more digestible feature space that provides spatially gap-free data to fulfill demands of many processing strategies that rely on homogeneous data coverage across a large area, e.g., machine learning-based land cover classification. Spectral Temporal Metrics (STMs) represent a conceptually simple feature space wherein multiple observations are temporally aggregated by statistical functions. The quality and inter-annual consistency of STMs is affected by data availability, including usable clear-sky observations that vary in time and space due to satellite lifecycles, sensor failures, changes of observation modes, climate regimes, orbital overlaps, as well as inter-annual variability of cloud cover. However, the relationship between data availability and STM consistency between years is still poorly understood, especially as differences in STMs between years can both result from inter-annual variability in data availability, as well as inter-annual variability of land surfaces. In this study, we systematically quantify the effect of inter-annual data availability on annual STMs for the years 1984–2019, while completely controlling for inter-annual land surface changes. Our results are expected to help assess where on Earth, and in what periods, specific metrics can be used or should be avoided when multi-annual consistency is required. We synthesized a global, nearly gap-free reference time series at daily temporal resolution from MODIS data. This “baseline” was subsequently degraded with actual annual Landsat mission observation scenarios resulting in synthetic annual time series that only differ with respect to data availability. Based on the differences between STMs generated from the baseline, and STMs computed from the degraded time series, we statistically quantified the accuracy, precision, and uncertainty (APU) for various STMs across the Landsat spectral bands. We compared the performance against a reasonable specification, i.e., a tolerated error. We aggregated APU metrics along climate zones annually to carve out regional and temporal differences. We found that huge regional differences exist, with the highest quality and consistency in arid climates (i.e., APU within specification). Errors in fully humid snow climates are high, yet systematic (biased but repeatable), whereas equatorial and temperate climates are characterized by unbiased but uncertain metrics, where accuracy or precision and uncertainty can exceed specification by a factor of three or more. Quality generally increased with time as a response to improved observation modes and data storage commitment, e.g., uncertainty improved from one sensor availability period to the next in >90% of all climate zones for the near infrared average – with the exception of the Landsat 7 scanline corrector failure in 2003 where quality decreased again in 62% of climate zones. We also derived and tested different measures of STM quality and found that the seasonal distribution of clear-sky observations is more important than the number of observations, e.g., the near infrared standard deviation's accuracy can be explained with an R2 of 0.55, and 0.78 by the number of observations, and maximum time between subsequent observations in Cfb climates, respectively. Furthermore, our findings revealed how many observations, or how short the largest gap between consecutive observations must be to still produce reliable metrics (e.g., a maximum gaps of 42–45 days to obtain tolerated uncertainty of the near infrared average and standard deviation in Cfb climates), which has substantial implications for the design of downstream applications relying on multi-annual STM. This study provides the tools for a global and systematic assessment of inter-annual STM consistency while controlling for land-surface dynamics and thereby paves the way for a systematic error quantification in Level 3 products.
Read full abstract