Abstract

In a cluster system with dynamic load sharing support, a job submission or migration to a workstation is determined by the availability of CPU and memory resources of the workstation at the time (L. Xiao et al., 2002). In such a system, a small number of running jobs with unexpectedly large memory allocation requirements may significantly increase the queuing delay times of the rest of jobs with normal memory requirements, slowing down execution of each individual job and decreasing the system throughput. We call this phenomenon the job blocking problem because the big jobs block the execution pace of majority jobs in the cluster. Since the memory demand of jobs may not be known in advance and may change dynamically, the possibility of unsuitable job submissions/migrations to cause the blocking problem is high, and existing load sharing schemes are unable to effectively handle this problem. We propose two schemes to address this problem. The first scheme, network RAM supported load sharing, combines job migrations with network RAM, which uses remote execution to initially allocate a job to the most lightly loaded workstation and, if necessary, network RAM to provide a global memory space for the job larger than it would be available otherwise. This scheme has the merits of both job migrations and network RAM. Our experiments show its effectiveness and scalability. However, this scheme requires a network RAM facility in the cluster, which may cause additional overhead and increase cluster network traffic. In order to address this limit, we propose a second scheme, memory reservation, incorporated with dynamic load sharing, which adaptively reserves a small set of workstations to provide special services to the jobs demanding large memory allocations. As soon as the blocking problem is resolved by the memory reservation scheme, the system will adaptively switch back to the normal load sharing state. Both schemes target on handling large data-intensive jobs in clusters, and are mutually complementary. The network RAM supported load sharing scheme can fully utilize the cluster global memory space, while the memory reservation scheme has the advantage of simple implementations and low overhead. Thus, they both can be effective alternatives, and practically deployed in cluster computing under different system conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.