Abstract

The National Aeronautics and Space Administration’s (NASA) Atmospheric Infrared Sounder (AIRS) has been collecting large quantities of remote sensing data about the vertical structure of temperature, water vapor, and clouds in the Earth’s atmosphere since its launch aboard the Aqua spacecraft in mid-2002. These data are both global and high resolution, so they are uniquely able to provide distributional information about second- and higher-order interactions that are at the heart of understanding climate processes. However, these data are so large and complex that the structures of interest are not directly accessible without some form of data reduction, and data reduction is particularly problematic because of the way the data are staged and stored. Thus, AIRS data pose a classic problem in the analysis of modern massive datasets: how to quantify and understand global distributional relationships in datasets that are impossible to work with except in small pieces? Our approach is to hierarchically reduce the data in a way that preserves distributional characteristics across subsets formed by stratifying on intuitively meaningful variables. By exploring how distributions change as functions of the stratification variables, we gain insight into processes generating the data. In this article, we describe our implementation of methodology first proposed in two earlier papers. Here, we have operationalized the algorithm and demonstrated that it is both practical within the NASA data processing pipeline, and leads to important scientific insights that are only possible when massive datasets are reduced in a way that respects the structure of multivariate distributions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call