An Improved TFIDF Algorithm Based on Dual Parallel Adaptive Computing Model

Yuwan Gu,Weikuan Jia,Juan Huan,Yaru Wang,Yuqiang Sun

doi:10.1109/cybermatics_2018.2018.00133

Abstract

The double parallel cloud computing framework based on GPU (Graphics Processing Unit) and MapReduce is proposed, which aims at the low efficiency for the large data sets on the stand-alone by text classification algorithm, constructs the adaptive computation process of double parallel computing and combines the advantage of improved TFIDF (term frequency-inverse document frequency) algorithm, and improves TFIDF text categorization algorithm with double parallel adaptive computing. In different operating environments, the efficiency of improved TFIDF algorithm will be compared with different computing nodes. The result shows that massive data can be processed effectively in high speed by improved TFIDF algorithm which adopts double parallel adaptive computing. With the number of nodes increasing, the algorithm execution efficiency with double parallel adaptive computing is getting more and more effective.

Full Text