Abstract
Big data workflow management systems (BDWMS)s have recently emerged as popular data analytics platforms to conduct large-scale data analytics in the cloud. However, the protection of data confidentiality and secure execution of workflow applications remains an important and challenging problem. Although a few data analytics systems, such as VC3 and Opaque, were developed to address security problems, they are limited to specific domains such as Map-Reduce-style and SQL query workflows. A generic secure framework for BDWMSs is still missing. In this article, we propose SecDATAVIEW, a distributed BDWMS that employs heterogeneous workers, such as Intel SGX and AMD SEV, to protect both workflow and workflow data execution, addressing three major security challenges: (1) Reducing the TCB size of the big data workflow management system in the untrusted cloud by leveraging the hardware-assisted TEE and software attestation; (2) Supporting Java-written workflow tasks to overcome the limitation of SGX’s lack of support for Java programs; and (3) Reducing the adverse impact of SGX enclave memory paging overhead through a “Hybrid” workflow task scheduling system that selectively deploys sensitive tasks to a mix of SGX and SEV worker nodes. Our experimental results show that SecDATAVIEW imposes moderate overhead on the workflow execution time.
Highlights
Today, technology advances provide an opportunity to collect and store a large amount of data from different data sources, such as Event logs, the Internet, Smartphones, Databases, Sensors, IoT devices, etc [1]
We present SecDATAVIEW, a new distributed big data workflow management system (BDWFMS) that leverages Intel Software Guard eXtensions (SGX) and AMD Secure Encrypted Virtualization (SEV) to develop a trusted execution environment (TEE) for the secure execution of big data workflows
While big data provides invaluable information for the decisionmaking process, big data analysis poses various challenges in storage, transfer, processing, and management due to the following big data characteristics [1], [31]: (1)-Volume that represents the size of big data records that range between terabytes to petabytes; (2)-Variety that demonstrates the format of data records, which can be structured, unstructured, or semi-structured in which data records do not typically follow a particular data schema; (3)-Velocity that represents the data arrival speed, which can be very high in real-time applications that require rapid and on-the-fly processing of data. (4)-Value that speaks for the results that could be extracted from big data records and is categorized as statistical, hidden, and unknown
Summary
Technology advances provide an opportunity to collect and store a large amount of data (referred to as big data) from different data sources, such as Event logs, the Internet, Smartphones, Databases, Sensors, IoT devices, etc [1]. A big data workflow management system (BDWFMS) is a system that completely defines, modifies, manages, monitors, and executes scientific workflows on the cloud in the order that is driven by the workflow logic [23], [32]. SecDATAVIEW was developed based on the DATAVIEW scientific workflow management system [23]. DATAVIEW represents the state-of-the-art big data workflow management system and has a strong user base (over 700 registered users worldwide). The Task Management module is responsible for the execution of workflow tasks that are executed in the cloud. SGX protects the integrity of the enclave code and data, even when the high-privileged system software is compromised [37]. To speed up the execution performance of parallel applications, SGX supports multi-threads inside the enclave
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Dependable and Secure Computing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.