The fast development of biomolecular structure determination has enabled the fine-grained study of objects in the micro-world, such as proteins and RNAs. The world is benefited. However, as the computational algorithms are constantly developed, the enrichment of features increases the algorithmic complexity and brings more computationally unfriendly modules. It calls for efficient solutions to leverage the rich and various hardware resources from the world’s most state-of-the-art supercomputing systems, and to fully accelerate the performance of the applications. In this paper, we present our efforts on porting and optimizing the 3D reconstruction of RELION, one of the most popular cryo-EM software for biomolecular structure determinations, by leveraging different resources of the latest generation of Sunway heterogeneous supercomputer. Several novel approaches are proposed to resolve different challenges faced by the complex algorithm, including a multi-level parallel scheme and operator optimizations to smartly map and scale RELION, efficient strategies to largely address the memory bottlenecks and improve data locality, lock-free writing solutions to minimize write-write conflicts, and pipelining approaches to obtain excellent computation and communication overlap. Combining all proposed optimizations, the computation time is greatly reduced to under 2 hours, achieving 11.9 × and 8.9 × speedups on two different datasets. The overall design scales to 131,072 cores, increasing parallel efficiency from 33% to 61% and from 46% to 70%, respectively. To the best of our knowledge, this is the first work that fully optimized and scaled the 3D reconstruction of RELION using the latest Sunway system.