In this paper, we propose a heuristic for code partitioning for distributed memory multiprocessors (DMMs). Our method is data-flow based where all levels of parallelism can potentially be exploited. Given a weighted directed acyclic graph (DAG) representation of the program, our partitioning algorithm automatically determines the granularity of parallelism by partitioning the graph into tasks to be scheduled on the DMM. The granularity of parallelism depends only on the program to be executed and on the target machine parameters. The output of our algorithm is passed on as input to the scheduling phase. Unlike the scheduling problem as defined by Yang [A. Gerasoulis, T. Yang, IEEE Transactions on Parallel and Distributed Systems 4 (6) (1993) 686–701; T. Yang, Ph.D. Thesis, Rutgers University, New Brunswick, NJ, May 1993; T. Yang, A. Gerasoulis, IEEE Transactions on Parallel and Distributed Systems 5 (9) (1994) 951–967], the method presented in this paper uses task merging rather than task clustering. Finding an optimal solution to this problem is NP-complete. Due to the high cost of graph algorithms, it is nearly impossible to come up with close to optimal solutions that do not have very high cost (higher order polynomial). Therefore, our goal is to find a heuristic that gives good performance, and that has relatively low cost. Given a DAG with E edges and N nodes, the time complexity of our partitioning algorithm is O( E· N 3) in the worst case. For some cases, the average time complexity of the algorithm is O( N( E+ N)).