Nested loops represent a significant portion of application runtime in multimedia and DSP applications, an important domain of applications for coarse-grained reconfigurable architectures (CGRAs). While conventional approaches to mapping nested loops utilize only a single-dimensional pipelining, which is either along the innermost loop or along an outer loop, in this paper, we explore an orthogonal approach of pipelining along multiple loop dimensions by first flattening the loop nest. To remedy the inevitable problem of repetitive outer-loop computation in flattened loops, we present a small set of special operations that can effectively reduce the number and frequency of micro-operations in the pipelined loop. We also present a loop transformation technique that can make our special operations applicable to a broader range of loops, including those with triangular iteration spaces. Our experimental results using imperfect loops from StreamIt benchmarks demonstrate that our special operations can cover a large portion of operations in flattened loops, improve performance of nested loops by nearly 30% over using loop flattening only, and achieve near-ideal executions on CGRAs for imperfect loops.