This paper proposes a high-performance radix-4 Montgomery Modular Multiplication (MMM) algorithm and its corresponding hardware architecture for Elliptic Curve Cryptography (ECC), in which the quotient and the partial product accumulation are computed in parallel in each iteration. Additionally, in this MMM, the Redundant Signed Digit (RSD) representation and the Signed Digit Adder (SDA) are used to eliminate the long carry chain and achieve parallel computation, as well as remove pre-computation and integrate modular reduction operations. Our MMM algorithm is implemented in 256-bit and 1024-bit versions on Xilinx Virtex-6 and Virtex-7 FPGAs, respectively. It consumes only 1.55k/10.18k Look-Up Tables (LUTs), takes 133/517 clock cycles, and runs at maximum frequencies of 558.8/641.7 MHz. According to the comparison in terms of Area Time Product (ATP), our design can achieve the ATP of 0.369 over the 256-bit NIST prime domain, which is approximately half of that of the state-of-the-art works. The Scalar Point Multiplication (SPM) scheme using this MMM algorithm consumes 14.19k LUTs and completes a single Scalar Point Multiplication (SPM) operation in 0.217 ms, and it also has a lower ATP than most other SPM algorithms currently in existence.