Utilization of high-order spatial discretizations is an important trend in developing global atmospheric models. As a competitive choice, the multi-moment constrained volume (MCV) method can achieve high accuracy while maintaining similar parallel scalability to classical finite volume methods. In this work, we introduce the development of a hybrid parallel MCV-based global shallow-water model on the cubed-sphere grid. Based on a sequential code, we perform parallelization on both the process and the thread levels. To enable process-level parallelism, we first decompose the six patches of the cubed-sphere in a same 2-D partition and then employ a conflict-free pipe-flow communication scheme for overlapping the halo exchange with computations. To further exploit the heterogeneous computing capacity of an Intel Xeon Phi accelerated supercomputer, we propose a guided panel-based inner---outer partition to distribute workload among the CPUs and the coprocessors. In addition to the above, thread-level parallelism along with various optimizations is done on both the multi-core CPU and the many-core accelerator. Numerical experiments are carried out to validate the correctness of the optimized parallel code and examine its parallel performance. Test results show that both the CPU-only and the hybrid codes scale well to hundreds of processes in terms of both the strong and weak scaling. In particular, the hybrid code can achieve a speedup of $$2.56\times $$2.56× as compared to the CPU-only version. In the largest run on a $$9216\,\times \,9216\,\times \,6$$9216×9216×6 mesh (1.5 billion unknowns), the hybrid code sustains an aggregative performance of 26.5 Tflops with 486 processes (33,534 cores).