Memoryless nonlinear transform (MNLT) method was widely used in the statistical model for sea clutter simulations. When the radar scattering data sets were obtained, we can simulate large scene and long-time varying 3-D sea surface scattering using expended power spectral quickly and accurately. Compared with the personal computer platform, field-programmable gate array (FPGA) has the unique merit of energy efficiency, high performance and adaptability. In this article, we proposed a novel architecture for implementing the 3-D MNLT algorithm on FPGA using high-level synthesis. As the simulation size increases, the demand for storage resources will also increase rapidly, and the on-chip memory resource will be limited on FPGA. Aiming at these problems, we divided the 3-D space-time simulation into a 2-D spatial simulation and a 3-D temporal simulation, so that we can make full use of the off-chip memory. Our design employs multiple on-chip buffer structures to decrease the transfer time of internal and external data on the FPGA. We also design a dataflow inverse fast Fourier transform processing engine (PE). The dataflow implementation overlapped butterfly operation, increasing concurrency and the overall throughput of this PE. Experimental results show that we can obtain the same accuracy and higher efficiency on a Xilinx Zynq XC7Z100 SoC platform.