Abstract

Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. In this work, we focus on traces of workflows-common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent and (2) the use of realistic, open-access traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes > 48 million workflows captured from > 10 computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.

Highlights

  • WORKFLOWS are already a significant part of private datacenter and public cloud infrastructures [1], [2]

  • We identify a comprehensive set of requirements for a workflow trace archive

  • We identify as requirement that an archive must include a diverse set of traces to cover a broad spectrum of workflow sizes, structures, and other characteristics, including both general characteristics to many domains and fields, and idiosyncratic characteristics corresponding to only one domain or field

Read more

Summary

INTRODUCTION

WORKFLOWS are already a significant part of private datacenter and public cloud infrastructures [1], [2]. Since the introduction of commercial clouds, clients have increasingly started to ask for better QoS, and in particular have started to increasingly express non-functional requirement (NFRs) such as availability, privacy, and security demands in traces [4], [17] This leads us to research question RQ-2: How to support sharing workflow traces in a common, unified format? ACM introduced artifact review and badges to stimulate the release of both software and data artifacts for reproducibility and verification purposes [18] We add to this community-effort ours, which is scientific in nature: RQ-3: What is the impact of the source and domain of a trace on the characteristics of workflows?. All data used in this survey is available as open-access data and can be used to verify and extend this survey

A SURVEY OF WORKFLOW TRACE USAGE
Article Selection and Labeling
THE WORKFLOW TRACE ARCHIVE
Use Cases and Requirements
Overview of the WTA
Ind - 1
Workflow Model
Unified Trace Format
Mechanisms for Trace Selection
Tools for Analysis and Validation
Current Content
A CHARACTERIZATION OF WORKLOADS OF WORKFLOWS
Structural Patterns
Arrival Patterns
Burstiness
Parallelism in Workflows
Limits to Parallelism in Workflows
ADDRESSING CHALLENGES OF VALIDITY
RELATED WORK
Findings
CONCLUSION AND ONGOING WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call