Objectives: The objective of this work is to design efficient FFT architectures for real valued signals. For higher value of N, the design of FFT architecture has many butterflies in the same column. A considerable amount of power is needed for such architecture. The hardware complexity also increases. Methods: The throughput depends on the number of input data paths available, which also increases hardware complexity for higher values of N. This work aims at overcoming these shortcomings by employing one of the effective techniques such as parallel pipelined architectures. One butterfly structure is introduced in one column. For handling the complex data paths, different butterfly structures have been introduced using parallel processing and Folding methodology. The design is simulated in Xilinxs14.3, and synthesized in Cadence RTL compiler. Findings: The hardware complexity is reduced by introducing one butterfly structure in one column. Comparison is made between the different radix algorithms and pipelined architectures and Radix-2 multi path delay commutator (R2MDC) in terms of the required number of adders, delay elements, multiplier units, throughput and control complexity. Although the parallel architecture consumes more power, it occupies less area when compared to the R2MDC architecture. Parallel architecture is more area efficient than R2MDC, whereas the radix-2 multi-path delay commutator is simple to implement, since feedback is not needed in the design. Conclusion: This work can be extended to higher order N-values of DIF and DIT-FFT flow graphs.