Abstract

BigDataDIRAC is a Big Data solution with a Distributed Infrastructure with Remote Agent Control (DIRAC) access point. Users have the opportunity to access multiple Big Data resources scattered in different geographical areas, such as access to grid resources. This approach opens the possibility of offering not only grid and cloud to the users, but also Big Data resources from the same DIRAC environment. In this work, we describe a system to allow access to a federation of Big Data resources, including load balancing, using DIRAC. Our results demonstrate the ability of BigDataDIRAC to manage jobs driven by dataset location stored in the Hadoop File System (HDFS) of the Hadoop distributed clusters. DIRAC is used to monitor the execution, collect the necessary statistical data, and upload the results from the remote HDFS to the SandBox Storage machine. Performance results demonstrate that BigDataDIRAC load balancing is able to aggregate resources in an efficient manner.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call