Abstract

Remote memory access brings lower bandwidth and higher latency compared with local memory access in Cache Coherent Non-Uniform Memory Access (cc-NUMA) architecture. Especially in the cc-NUMA platform where computing nodes are connected with network, the latency and bandwidth of network perform much worse than Hyper Transport (HT) and PCI-Express (PCI-E) bus. In order to enhance the performance of applications, a Hybrid Parallel Framework for Computation-intensive Applications (HPFCA) was proposed. Task distribution, data storage, multicore parallelism and kernel optimization were discussed in the HPFCA. "MPI+OpenMP/Pthreads" mechanism was used for multi-node platforms. MPI was used for distributed memory parallelism, and "OpenMP/Pthreads" was used for shared memory parallelism. Moreover, GEMM and FFT, the representatives of the computation-intensive applications in the Godson-3B, were studied. According to the HPFCA, the parallel algorithms of GEMM and FFT were optimized. Finally, experimental results demonstrated that HPFCA could bring ideal performance in the Godson-3B.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.