Design and implementation of Word2Vec parallel algorithm based on HPC

Xianyong Yi,Rongge Zheng,Yufeng Chen,Hao Qin,Aoyu Wang

doi:10.1109/cac.2017.8242835

Abstract

Word2Vec (Word to Vector) processes natural language by calculating the cosine similarity. However, the serial algorithm of original Word2Vec fails to satisfy the demands of training of corpus text because of the explosive growth of information. It has become the bottleneck owing to its comparatively low processing efficiency. The High Performance Computing (HPC) specializes in improving the calculation efficiency; therefore, the training efficiency of corpus texts can be greatly improved by parallelizing Word2Vec algorithm. After analyzing the characteristics of the Word2Vec algorithm in detail, we design and implement a parallel Word2Vec algorithm and use it to train corpus text on HPC. Furthermore, the corpus texts of different sizes are collected and trained, and the speed-up ratio is calculated by using the serial algorithm and parallel algorithm of Word2Vec, respectively. The experimental results show that there is a higher speed-up ratio when using the Word2Vec parallel algorithm running on HPC.

Full Text