The purpose of this study is to investigate the hybrid parallelization of the Stochastic Gradient Descent (SGD) algorithm for solving the matrix completion problem on a high-performance computing platform. We propose a hybrid parallel decentralized SGD framework with asynchronous inter-process communication and a novel flexible partitioning scheme to attain scalability up to hundreds of processors. We utilize Message Passing Interface (MPI) for inter-node communication and POSIX threads for intra-node parallelism. We tested our method by using different real-world benchmark datasets. Experimental results on a hybrid parallel architecture showed that, compared to the state-of-the-art, the proposed algorithm achieves 6× higher throughput on sparse datasets, while it achieves comparable throughput on relatively dense datasets.