Abstract

Cloud based scientific data management - storage, transfer, analysis, and inference extraction - is attracting interest. In this paper, we propose a next generation cloud deployment model suitable for data intensive applications. Our model is a flexible and self-service container-based infrastructure that delivers - network, computing, and storage resources together with the logic to dynamically manage the components in a holistic manner. We demonstrate the strength of our model with a bioinformatics application. Dynamic algorithms for resource provisioning and job allocation suitable for the chosen dataset are packaged and delivered in a privileged virtual machine as part of the container. We tested the model on our private internal experimental cloud that is built on low-cost commodity hardware. We demonstrate the capability of our model to create the required network and computing resources and allocate submitted jobs. The results obtained shows the benefits of increased automation in terms of both a significant improvement in the time to complete a data analysis and a reduction in the cost of analysis. The algorithms proposed reduced the cost of performing analysis by 50% at 15 G B of data analysis. The total time between submitting a job and writing the results after analysis also reduced by more than 1 h r at 15 G B of data analysis.

Highlights

  • Large scale data are increasingly generated from a wide variety of sources such as scientific experiments and monitoring devices

  • LJF-KQ algorithm To provision the required VMs, we proposed a variation of [30] as Largest Job First on the K Queues (LJF-KQ) strategy

  • LJF-KQ-L algorithm The Largest Job First on K Queues with Lookup (LJFKQ-L) algorithm is a variation of LJF-KQ with lookup for finish times

Read more

Summary

Introduction

Large scale data are increasingly generated from a wide variety of sources such as scientific experiments and monitoring devices. Service layer This enables the functionalities for container-based cloud service creation It performs the provisioning of resources (e.g. CPU time, memory, storage, and network bandwidth) to a vCell, interacts with the underlying layer, and performs additional global scheduling. Implementing the data analysis container Our work considers self-service and dynamic algorithms for initial VM size, VM provisioning, and job allocation. For other application types with different requirements other than memory, a different scheme is required In both cases, knowledge of the domain and the historical output trace of previously executed related jobs are valuable inputs to the inference mechanism to determine the relationship between the task requirements and capacity (e.g. bandwidth, memory, CPU) of the VMs. Step 3: Creation of virtual network (VIF) for the created VM.

15: Return Type
26: Return V
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call