Abstract

Cloud-based Analytics-as-a-Service (CLAaaS) was developed by Zulkernine et al. with a goal to simplifying big data analytics users. It provides software-as-a-service access to a variety of back end analytics tools and data stores. One of the tools is the Workflow Instance Generation and Selection (WINGS). WINGS allows users to reuse predefined workflows and their components containing semantic meta-data to define new workflows; late binding of the workflows to data at the time of execution to enable the use of most recent data, and definition of domain specific software code as custom analytic components in workflows. How ever, the data used in WINGS for the workflows are mostly flat files that are stored on the WINGS server or shared directories. The goal of this project is to add support for big data storage systems to WINGS and validate the extensions using multiple data analytic workflows of different complexities with data residing in a variety of back end data sources. The extension allows the CLAaaS users to create, validate and execute analytic workflows in a distributed environment and use data from multiple big data storage systems. We validate our work using four big data storage systems in WINGS workflows namely, Apache HBase, MongoDB, MySQL with a front-end interface.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call