Abstract

Scientific workflows and their supporting systems are becoming increasingly popular for compute-intensive and data-intensive scientific experiments. The advantages scientific workflows offer include rapid and easy workflow design, software and data reuse, scalable execution, sharing and collaboration, and other advantages that altogether facilitate “reproducible science”. In this context, provenance – information about the origin, context, derivation, ownership, or history of some artifact – plays a key role, since scientists are interested in examining and auditing the results of scientific experiments. However, in order to perform such analyses on scientific results as part of extended research collaborations, an adequate environment and tools are required. Concretely, the need arises for a repository that will facilitate the sharing of scientific workflows and their associated execution traces in an interoperable manner, also enabling querying and visualization. Furthermore, such functionality should be supported while taking performance and scalability into account. With this purpose in mind, we introduce PBase: a scientific workflow provenance repository implementing the ProvONE proposed standard, which extends the emerging W3C PROV standard for provenance data with workflow specific concepts. PBase is built on the Neo4j graph database, thus offering capabilities such as declarative and efficient querying. Our experiences demonstrate the power gained by supporting various types of queries for provenance data. In addition, PBase is equipped with a user friendly interface tailored for the visualization of scientific workflow provenance data, making the specification of queries and the interpretation of their results easier and more effective.

Highlights

  • The origin and processing history of an artifact is known as its provenance

  • The need arises for a repository allowing multiple users to store and query scientific workflow provenance in an interoperable manner

  • User Interaction with PBase As a proof of concept user interface aimed at end users, we developed a Web-based GUI that enables users to upload a provenance trace, visualize the workflow alongside its various traces, issue queries and obtain visualizations of their results

Read more

Summary

Introduction

The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product was generated, including the system and the steps in the computational process involved along with the user responsible for its execution, time, and resources used, such as parameter settings, input data, software tools, etc. Scientific workflow systems provide a user-friendly graphical interface to specify a computational process in the form of a directed graph of interconnected tasks Such a graph is an abstraction that can be regarded as prospective provenance, since it details the steps to follow in order to generate the desired result. Our prototype only enables the creation of a database from a VisTrails exported XML trace file, which contains the workflow specification and the various traces corresponding to the workflow runs This file is submitted to PBase through the TraceUpload component of the Web client as an HTTP POST request, which includes a user supplied identifier for the database to be created. Such highlighting is performed entirely on the client side via the GUI by taking advantage of the encoding pre-computed on the server, resulting in a more fluid user interaction

Related Work
Future Work
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call