With rapid development and application of artificial intelligence and block chain, the requirement of information and data security is also increased, in which the public-key cryptography, such as Rivest-Shamir-Adleman (RSA) cryptography, plays a significant role. Modular exponentiation is fundamental in computer arithmetic and is widely applied in cryptography, such as ElGamal cryptography, Diffie–Hellman key exchange protocol, and RSA cryptography. The implementation of modular exponentiation in a residue number system leads to high parallelism in computation and has been applied in many hardware architectures. While most residue number system (RNS)-based architectures utilize RNS Montgomery algorithm with two residue number systems, the recent modular multiplication algorithm with sum residues performs modular reduction in only one residue number system with about the same parallelism. In this work, it is shown that the high-performance modular exponentiation and RSA cryptography can be implemented in RNS. Both the algorithm and architecture are improved to achieve high performance with extra area overheads, where a 1024-bit modular exponentiation can be completed in 0.567 ms in Xilinx XC6VLX195t-3 platform, costing 26 489 slices, 87 357 LUTs, 363 dedicated multipilers of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$18$</tex-math> </inline-formula> <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $</tex-math> </inline-formula> <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$18$</tex-math> </inline-formula> bits, and 65 block RAMs.