Elliptic curve cryptography (ECC) is widely used for secure communications, because it can provide the same level of security as RSA with a much smaller key size. In constrained environments, it is important to consider efficiency, in terms of execution time and hardware costs. Modular inversion is a key time-consuming calculation used in ECC. Its hardware implementation requires extensive hardware resources, such as lookup tables and registers. We investigate the state-of-the-art modular inversion algorithms, and evaluate the performance and cost of the algorithms and their hardware implementations. We then propose a high-radix modular inversion algorithm aimed at reducing the execution time and hardware costs. We present a detailed radix-8 hardware implementation based on 256-bit primes in Verilog HDL and compare its cost performance to other implementations. Our implementation on the Altera Cyclone V FPGA chip used 1227 ALMs (adaptive logic modules) and 1037 registers. The modular inversion calculation took 3.67 ms. The AT (area–time) factor was 8.30, outperforming the other implementations. We also present an implementation of ECC using the proposed radix-8 modular inversion algorithm. The implementation results also showed that our modular inversion algorithm was more efficient in area–time than the other algorithms.