The finite field Modular Multiplier (MM) over GF (2m) based on the National Institute of Standards and Technology (NIST) recommended polynomials can be used as a critical component in many cryptosystems. But much of the research suffers from the problems of only supporting a single curve or being very inefficient. In this paper, we present a KA-based unified algorithm (hybrid field size) and its corresponding architecture for high-performance implementation of the systolic modular multiplier over GF (2m), which can support all the NIST recommended polynomials. A number of efficient techniques have been explored and used to realize a high-performance implementation of the multiplier. First, we propose a three-term KA based MM approach and make it fit for systolic implementation. Second, we propose a configurable, unified irreducible polynomial that can support all five curves recommended by NIST and the corresponding modular reduction method. Third, the corresponding systolic structure is also designed in accordance with (1) and (2) to improve efficiency, and we adopt register sharing and computation unit sharing methods to reduce the space complexity of the proposed structure. Additionally, parallel computation of two163-width MMs is employed to improve resource utilization further. From the synthesis results on ASIC, it is shown that the proposed multipliers have significantly lower latency and higher performance than the existing designs. e.g., the proposed structure could achieve up to a 73 % reduction in area-delay product (ADP) over the existing structures.