A parallel implementation of a nonlinear pseudo-spectral MHD code for the simulation of turbulent dynamos in spherical geometry is reported. It employs a dual domain decomposition technique in both real and spectral space. It is shown that this method shows nearly ideal scaling going up to 128 CPUs on Beowulf-type clusters with fast interconnect. Furthermore, the potential of exploiting single precision arithmetic on standard x86 processors is examined. It is pointed out that the MHD code thereby achieves a maximum speedup of 1.7, whereas the validity of the computations is still granted. The combination of both measures will allow for the direct numerical simulation of highly turbulent cases ( 1500 < Re < 5000 ), which have been previously impractical, given today's computational speed.