Abstract

Data sets that provide a ground truth to quantify the efficacy of automated algorithms are rare due to the time consuming and expensive, although highly valuable, task of manually annotating observations. These datasets exist for niche problems in developed fields such as Natural Language Processing (NLP) and Business Process Mining (BPM), however it is difficult to find a suitable dataset for use cases that span across multiple fields, such as the one described in this study. The lack of established ground truth maps between cyberspace and the human-interpretable, persona-driven tasks that occur therein, is one of the principal barriers preventing reliable, automated situation awareness of dynamically evolving events and the consequences of loss due to cybersecurity breaches. Automated workflow analysis—the machine-learning assisted identification of templates of repeated tasks—is the likely missing link between semantic descriptions of mission goals and observable events in cyberspace. We summarize our efforts to establish a ground truth for an email dataset pertaining to the operation of an open source software project. The ground truth defines semantic labels for each email and the arrangement of emails within a sequence that describe actions observed in the dataset. Identified sequences are then used to define template workflows that describe the possible tasks undertaken for a project and their business process model. We present the overall purpose of the dataset, the methodology for establishing a ground truth, and lessons learned from the effort. Finally, we report on the proposed use of the dataset for the workflow discovery problem, and its effect on system accuracy.

Highlights

  • The prevalence of Information and Communication Technology (ICT) and their function as a critical capability enabler poses a risk for organizations should they become degraded, compromised, or inoperable [1]

  • In this work we constructed a ground truth dataset that describes a subset of the business functions for an open source software project in order to facilitate methods for automated workflow analysis

  • This ground truth contains manually annotated keywords, metalabels, traces, and actions, via the Delphi consensus method, that serve as meaningful descriptors to construct the workflows that best describe these business functions

Read more

Summary

Introduction

The prevalence of Information and Communication Technology (ICT) and their function as a critical capability enabler poses a risk for organizations should they become degraded, compromised, or inoperable [1]. Commanders want to develop risk management processes to protect their ICT capability enablers and provide mission assurance, where Mission Assurance (MA) is defined as “measures required to accomplish essential objectives of missions in a contested environment” [2]. Reflect the views of the Assistant Secretary of Defense for Research and Engineering (https:// www.acq.osd.mil/chieftechnologist/). The sponsor agrees that this material is releasable to the public without restriction

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call