Abstract

Word2Vec (Word to Vector) processes natural language by calculating the cosine similarity. However, the serial algorithm of original Word2Vec fails to satisfy the demands of training of corpus text because of the explosive growth of information. It has become the bottleneck owing to its comparatively low processing efficiency. The High Performance Computing (HPC) specializes in improving the calculation efficiency; therefore, the training efficiency of corpus texts can be greatly improved by parallelizing Word2Vec algorithm. After analyzing the characteristics of the Word2Vec algorithm in detail, we design and implement a parallel Word2Vec algorithm and use it to train corpus text on HPC. Furthermore, the corpus texts of different sizes are collected and trained, and the speed-up ratio is calculated by using the serial algorithm and parallel algorithm of Word2Vec, respectively. The experimental results show that there is a higher speed-up ratio when using the Word2Vec parallel algorithm running on HPC.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.