Abstract

In the light of the excellent distributed storage and parallel processing feature of hadoop cluster, a new kind of network public opinion classification method based on Naive Bayes algorithm in hadoop environment is studied. The collected public opinion documents are stored locally according to the HDFS architecture, and whose character words are extracted paralleled in Mapreduce process. Thus the naive Bayesian classification algorithm is parallel encapsulated on cloud computing platform. The MapReduce packaged Naive Bayesian classification algorithm performance is verified and the results show that the algorithm execution speed are significantly improved compared to a single server. Its public opinion classification accuracy rate is more than 85%, which can effectively improve the classification performance of network public opinion and classification efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call