In this paper, a low power, area, and delay 2D Finite Impulse Response (FIR) filter architecture is derived from an analysis of a memory-efficient design. The completely direct-form 2D FIR filter is where the idea of parallel processing is first presented. As a result, the FIR filter may make better use of its memory by reusing its contents. With a block size of L and a filter length of N, a non-separable 2D FIR filter structure is developed and implemented. The FIR filter's arithmetic module makes use of high-speed, low-power multipliers and Carry Look Ahead (CLA) adders, with the output calculated by a pipelined adder unit. Verilog HDL code is used to represent the proposed architecture, and the CADENCE environment's NC Simulator and RTL Compiler synthesis tool are used to verify the design. Existing memory-efficient 2D FIR filter hardware architectures are compared to the produced area, power, and delay reports. Using Modified CLA (MCLA) adders and pipelining, we were able to cut down on power consumption by 44% and delay by 20%. Index Terms: Memory reuse, 2D-FIR filter, Low Power Multiplier, Parallel Prefix Adder, Carry Look Ahead adder, and pipelining.