Hybrid density functional approximations (DFAs) offer compelling accuracy for abinitio electronic-structure simulations of molecules, nanosystems, and bulk materials, addressing some deficiencies of computationally cheaper, frequently used semilocal DFAs. However, the computational bottleneck of hybrid DFAs is the evaluation of the non-local exact exchange contribution, which is the limiting factor for the application of the method for large-scale simulations. In this work, we present a drastically optimized resolution-of-identity-based real-space implementation of the exact exchange evaluation for both non-periodic and periodic boundary conditions in the all-electron code FHI-aims, targeting high-performance central processing unit (CPU) compute clusters. The introduction of several new refined message passing interface (MPI) parallelization layers and shared memory arrays according to the MPI-3 standard were the key components of the optimization. We demonstrate significant improvements of memory and performance efficiency, scalability, and workload distribution, extending the reach of hybrid DFAs to simulation sizes beyond ten thousand atoms. In addition, we also compare the runtime performance of the PBE, HSE06, and PBE0 functionals. As a necessary byproduct of this work, other code parts in FHI-aims have been optimized as well, e.g., the computation of the Hartree potential and the evaluation of the force and stress components. We benchmark the performance and scaling of the hybrid DFA-based simulations for a broad range of chemical systems, including hybrid organic-inorganic perovskites, organic crystals, and ice crystals with up to 30 576 atoms (101 920 electrons described by 244 608 basis functions).