Abstract

This paper introduces a new load balancing and communication minimizing heuristic used in theInverse Remote Procedure Call(IRPC) system. While the paper briefly describes the IRPC system, the focus is on the newIRPCassignment heuristic. The IRPC compiler maps a distributed program to a graph that represents program objects and their dependencies (due to invocations and parameter passing) as nodes and edges, respectively. In the graph, the system preserves conditional and iterative flows, records network transmission and execution costs, and marks nodes that have to reside at specific network sites. The graph is then partitioned by the heuristic to derive a (sub)optimal node assignment to network sites minimizing load balancing and network data transport. The resulting program partition is then reflected in the physical object distribution, and remote and local object communication is transparently implemented. The compiler and run-time system use efficient implementation techniques such as type prediction, inlining, splitting and subprogram passing. The last of these allows remote code to be copied to local data, as an alternative to copying data to the remote site, whenever this will reduce network data transport. The IRPC graph partitioning heuristic operates in timeO(E(logd+l+ logM)), whereMis the number of network sites,Eis the number of communication edges, anddis the maximum degree of a node;lis a parameter of the algorithm, and can vary between 1 andN, whereNis the number of communicating objects. This complexity is more nearly independent ofM, and considerably better in terms ofEandN, than that of previously known related algorithms, such as A*, which employs backtracking and is potentially exponential, or the max-flow/min-cut class of network flow algorithms or heuristics which tend to be at least of Ω(MN2E), and it can be made (by choosinglappropriately) as efficient as even such fast heuristics as heaviest-edge-first, minimal communication, and Kernighan–Lin. In an extensive quantitative evaluation, the heuristic has been demonstrated to perform very well, giving on the average 75% traffic cost reductions for over 95% of the programs when compared to random partitioning, and outperforming in cost reduction and actual execution time the three aforementioned fast heuristics, even with a largel. Thus, to the best of our knowledge, this is the first report of a well-performing assignment heuristic that is bothessentially linearin the number of communication edges, andbetterthan existing, established heuristics of no better complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call