Abstract

AI (artificial intelligence)-based analysis of geospatial data has gained a lot of attention. Geospatial datasets are multi-dimensional; have spatiotemporal context; exist in disparate formats; and require sophisticated AI workflows that include not only the AI algorithm training and testing, but also data preprocessing and result post-processing. This complexity poses a huge challenge when it comes to full-stack AI workflow management, as researchers often use an assortment of time-intensive manual operations to manage their projects. However, none of the existing workflow management software provides a satisfying solution on hybrid resources, full file access, data flow, code control, and provenance. This paper introduces a new system named Geoweaver to improve the efficiency of full-stack AI workflow management. It supports linking all the preprocessing, AI training and testing, and post-processing steps into a single automated workflow. To demonstrate its utility, we present a use case in which Geoweaver manages end-to-end deep learning for in-time crop mapping using Landsat data. We show how Geoweaver effectively removes the tedium of managing various scripts, code, libraries, Jupyter Notebooks, datasets, servers, and platforms, greatly reducing the time, cost, and effort researchers must spend on such AI-based workflows. The concepts demonstrated through Geoweaver serve as an important building block in the future of cyberinfrastructure for AI research.

Highlights

  • Artificial intelligence (AI) is profoundly reshaping the scientific landscape [1]

  • The results prove that the framework can bring great benefits to the AI community to build, run, monitor, share, track, modify, reproduce, and reuse their AI workflows in either a single-machine environment or a distributed environment

  • Given the complex architectures common in machine learning, for example, in neural networks with sometimes hundreds or thousands of layers weighted with parameters that are not humanly interpretable, AI techniques are contributing to a widening crisis in repeatability, replicability, and reproducibility [76,77,78,79,80]

Read more

Summary

Introduction

Artificial intelligence (AI) is profoundly reshaping the scientific landscape [1]. Adoption of AI techniques has brought a substantial revolution in how we collect, store, process, and analyze legacy and current datasets [2,3,4,5]. Rasp et al trained a deep neural network to represent all atmospheric subgrid processes in a climate model by learning from a multiscale model [30]. Bergen et al pointed out that the use of machine learning in geosciences could mainly focus on (1) performing the complex predictive tasks that are difficult for numeric models; (2) directly modeling the processes and interactions by approximating numerical simulations; and (3) revealing new and often unanticipated patterns, structures, or relationships [23]. The existence of Geoweaver could serve as a fundamental tool for the future AI workflow management and boost the practical application of AI in the geosciences

Big Spatial Data and AI
Workflow Management Software
Function Chain
Error Control
Provenance
Big Data Processing Workflow
Process
Workflow
Findings
Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call