A reusable scientific workflow for conservation planning

Siddeswara Mayura Guru ,Craig E Franklin ,Matthew Watts ,Ross G Dwyer ,Tim Clancy ,Hamish A Campbell ,Hoang Anh Nguyen ,Minh Ngoc Dinh ,David Abramson ,Hugh P Possingham

doi:10.36334/modsim.2015.f13.guru

Abstract

In order to perform complex scientific data analysis, multiple software and skillsets are generally required. These analyses can involve collaborations between scientific and technical communities, with expertise in problem formulation and the use of tools and programming languages. While such collaborations are useful for solving a given problem, transferability and productivity of the approach is low and requires considerable assistance from the original tool developers. Any complex scientific data analysis involves accessing and refining large volumes of data, running simulations and algorithms, and visualising results. These steps can incorporate a variety of tools and programming languages, and can be constructed as a series of activities to achieve a desired outcome. This is where scientific workflows are very useful. Scientific workflows abstract complex analyses into a series of inter-dependent computational steps that lead to a solution for a scientific problem. Once constructed, the workflow can be executed repeatedly and the results reproduced with minimal assistance from the original tool developers. This improves transferability, repeatability and productivity, and reduces costs by reusing workflow components for similar problems but using different datasets. Kepler is a popular open-source scientific workflow tool for designing, executing, archiving and sharing workflows. It has the ability to couple disparate execution environments on a single platform. For example, users can run analysis steps written in Python, R and Matlab on a single platform as part of a single analysis and synthesis experiment. Kepler provides a wide variety of reusable components that perform various tasks, including data access from databases, remote system, file system and web services, and data servers, and executes these processes in a local or distributed environment. Together these functionalities provide greater flexibility for researchers to undertake complex scientific analyses compared with traditional homogeneous environments. In this paper, we will describe a new scientific workflow based on Kepler that automates data analysis tasks for Marxan, a widely used conservation planning software. Marxan is used by over 4,200 active users in more than 180 countries to identify gaps in biodiversity protection, identify cost effective areas for conservation investment and inform multiple-use zoning. Its use is expanding rapidly and this new functionality will improve the application of Marxan to various conservation planning problems. A Kepler workbench has been extended to provide functionality to invoke Marxan and execute it within a distributed environment using Nimrod/K. Our aim was to develop a reproducible, reusable workflow to generate conservation planning scenarios on the Kepler platform. The workflow components include data acquisition and pre-processing, construction of planning scenarios, generation of efficient solutions to the complex problem formulations and visualization of outputs. The workflow components are shared for reuse and re- configured to design and simulate other conservation planning applications. We also present a use case to demonstrate a Kepler Marxan workflow to design and implement conservation planning computational simulation experiments.

Full Text