There is an enormous demand for high speed data communication or high speed internet using Long Term Evolution (LTE) or LTE-Advanced communication methods. To achieve the high speed data rate in the receiver side, it is necessary to achieve high throughput in the Fast Fourier transform (FFT) architecture. Hence there is demand to improve the throughput of the FFT architectures used for high speed data communication. The FFT architecture is designed and optimized for LTE-A applications. However, the high throughput has been achieved by sacrificing hardware resources of the Field Programmable Gate Array (FPGA). Many fixed and variable length FFT processors are proposed by the researchers with improved performance focusing either on algorithmic modifications, novel architectural optimizations or radix selection. Among these FFT processors, the goal and main objectives of this paper are: 1. To design and implement a pipelined FFT architecture to give high throughput for LTE-A MIMO applications 2. To develop an intellectual property (IP) core for FFT computation with variable FFT size 3. To propose a parallel implementation to increase the performance of LTE-A baseband processing system. In an Orthogonal Frequency Division Multiplexing (OFDM) baseband communication system, FFT operation is one of the highest computationally serious tasks which directly influence the communication system performance factor. The baseband hardware has to be capable as well as efficient enough to calculate FFT in specific timing restrictions. Thus, the parallel pipelined multi radix Variable Length FFT architectures for LTE-A Multiple-Input-Multiple-Output (MIMO) applications have been designed. The proposed FFT architecture delivers a throughput of 550 MSPS at the maximum clock rate of 550 MHz. The proposed FFT support the FFT length of 64, 128, 256, 512, 1024 and 2048 for LTE-Advanced MIMO standard. The proposed FFT architecture has been implemented in Xilinx Artix-7 FPGA device and the performance metrics have been analysed. Even though the proposed FFT architecture consumes additional Block-RAMs (BRAM) and quite an amount of Xilinx-Xtreme Digital Signal Processor (DSP)/ DSP48 resources, the Power Delay Product (PDP) of the proposed FFT is excellent compared to the existing FFT architectures. The proposed multi radix mixed 2/3/5 parallel pipelined 2048 point FFT architectures exhibits very less latency of 11.382 µs at 550 MHz clock frequency compared to the existing system with the latency of 56.88 µs at 200 MHz clock frequency. Moreover, the designed FFT architecture utilizes 96 Xilinx Xtreme DSP blocks 25040 clock cycles to complete the FFT operation with an excellent throughput of 2200 MSPS with FFT computation time of 45.528 µs at the clock frequency of 550 MHz for the 4x4 MIMO- OFDM architecture. Therefore, the designed FFT provides high throughput rate in order to meet the modern wireless standard specifications. The proposed FFT architecture outperforms well in terms high throughput, low latency and better PDP with extra hardware as trade-off for both for Single FFT and for MIMO technology.