Big data task scheduling in cloud computing environments has gained considerable attention in the past few years, due to the exponential growth in the number of businesses that are relying on cloud-based infrastructure as a backbone for big data storage and analytics. The main challenge in scheduling big data services in cloud-based environments is to guarantee minimal makespan while minimizing at the same time the amount of utilized resources. Several approaches have been proposed in an attempt to overcome this challenge. The main limitation of these approaches stems from the fact that they overlook the trust levels of the Virtual Machines (VMs), thus risking to endanger the overall Quality of Service (QoS) of the big data analytic process, which includes not only heartbeat frequency ratio and resource consumption, but also security challenges such as intrusion detection, access control, authentication, etc. To overcome this limitation, we propose in this work a trust-aware scheduling solution called BigTrustScheduling that consists of three stages: VMs’ trust level computation, tasks priority level determination, and trust-aware scheduling. Experiments conducted on a real Hadoop cluster environment using real-world datasets collected from the Google Cloud Platform pricing and Bitbrains task and resource requirements show that our solution minimizes the makespan by 59% compared to the Shortest Job First (SJF), by 48% compared to the Round Robin (RR), and by 40% compared to the improved Particle Swarm Optimization (PSO) approaches in the presence of untrusted VMs. Moreover, our solution decreases the monetary cost by 58% compared to the SJF, by 47% compared to the RR, and by 38% compared to the improved PSO in the presence of untrusted VMs. The results in this work can be applicable to other problems. This would be possible through tuning the corresponding metrics in the formulation of the problem and solution, as will as in the experimental environment. In fact, the trust model can be extended to other environments including cloud computing, IoT, parallel computing, etc.
Read full abstract