Dynamic dataflow machines exploit parallelism among loop iterations by loop unraveling : all iterations of the loop are started together and operations in various iterations execute when their input data are present. Unbounded loop unraveling can strain the resources available on the machine and, in extreme cases, deadlock can occur due to overcommitment of resources. Previous efforts to address this problem have focused mainly on run-time mechanisms of debatable utility. Loop bounding, a compile-time technique, controls parallelism by permitting a fixed number of iterations to execute at one time. In this paper, we argue that loop bounding can lead to inefficient use of resources, and we propose an alternative way of compiling loops for overlapped execution of loop iterations. We introduce the notion of a stage decomposition of a loop, which defines a partition of the operations in a loop iteration into stages, and we show that the problem of choosing a stage decomposition for a particular loop can be tackled by applying static scheduling techniques like the ones used in generating code for VLIW machines. These techniques permit the compiler to allocate resources more skillfully than with loop bounding. The practical utility of stage decomposition remains to be tested on a real dataflow machine. In the absence of one, we describe how our schema could be implemented on the Monsoon dataflow machine being built at MIT.