In this paper, an optimized and high-performance Two-Dimensional (2D) Finite Impulse Response (FIR) filter is designed and hardware architecture is implemented for image processing applications. The higher-order circular symmetric 2D FIR filter is designed using a modified McClellan Transformation called a P4 transformation. The perfect circular symmetry in the contour of the 2D FIR filter and less complexity and error are attained by this proposed P4 Transformation. Next, the designed filter coefficients are represented by the Canonical Signed Digit (CSD) number format to attain the multiplier-less design to avoid the power complexity multipliers. Further, to reduce the hardware complexity, the Common Subexpression Elimination (CSE) technique is utilized to reduce the number of adders and hence area and power are decreased. Each sub-filter corresponding to the CSD coefficient rows is realized and integrated as a 2D FIR filter using a Fully Direct-type architecture. This 2D FIR filter architecture is coded by Verilog and synthesized by Cadence tools in a 45 nm CMOS technology library. The Delay, Power Consumption (PC), and Area reports were generated by the Genus synthesis tool and compared with the state-of-the-art works. The PC, area, and delay of the proposed filter architecture are reduced to the minimum of 13.96%, 69.1%, and 1.4%, respectively when compared to the existing filter architectures. The maximum of 10.48 times and a minimum of 1.18 times of Area Delay Product (ADP), and a maximum of 10.69 times and a minimum of 1.063 times of Power Delay Product (PDP) are smaller than the existing 2D filter architectures.