Abstract

At the dawn of the 21st century, environmental scientists are collecting more data more rapidly than at any time in the past. Nowhere is this change more evident than in the advent of sensor networks able to collect and process (in real time) simultaneous measurements over broad areas and at high sampling rates. At the same time there has been great progress in the development of standards, methods, and tools for data analysis and synthesis, including a new standard for descriptive metadata for ecological datasets (Ecological Metadata Language) and new workflow tools that help scientists to assemble datasets and to diagram, record, and execute analyses. However these developments (important as they are) are not yet sufficient to guarantee the reliability of datasets created by a scientific process — the complex activity that scientists carry out in order to create a dataset. We define a dataset to be reliable when the scientific process used to create it is (1) reproducible and (2) analyzable for potential defects. To address this problem we propose the use of an analytic web, a formal representation of a scientific process that consists of three coordinated graphs (a data-flow graph, a dataset-derivation graph, and a process-derivation graph) originally developed for use in software engineering. An analytic web meets the two key requirements for ensuring dataset reliability: (1) a complete audit trail of all artifacts (e.g., datasets, code, models) used or created in the execution of the scientific process that created the dataset, and (2) detailed process metadata that precisely describe all sub-processes of the scientific process. Construction of such metadata requires the semantic features of a high-level process definition language. In this paper we illustrate the use of an analytic web to represent the scientific process of constructing estimates of ecosystem water flux from data gathered by a complex, real-time multi-sensor network. We use Little-JIL, a high-level process definition language, to precisely and accurately capture the analytical processes involved. We believe that incorporation of this approach into existing tools and evolving metadata specifications (such as EML) will yield significant benefits to science. These benefits include: complete and accurate representations of scientific processes; support for rigorous evaluation of such processes for logical and statistical errors and for propagation of measurement error; and assurance of dataset reliability for developing sound models and forecasts of environmental change.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.