Abstract

In order to solve the problem of inaccurate description of news content features and user interest features in mobile cloud computing, proposed a multimedia text classification algorithm that utilizes multi-tag potential Dirichlet distribution. The algorithm is based on the traditional latent Dirichlet allocation (LDA) model and assumes a linear relationship between user tags and potential topics. Therefore, a relational matrix is introduced in the LDA model to describe the corresponding relationship between the tag and the topic, so that the probability distribution of the tag on the word can be inferred from the probability distribution of the topic on the word. The algorithm first learns the probability distribution table of label words by Gibbs sampling method, then infers the probability distribution of new documents on labels according to the model parameters, so as to realize the purpose of predicting the corresponding multiple labels of documents. In order to improve the ability of the algorithm to deal with massive data, the parallel algorithm has been improved. Since the bottleneck of the algorithm lies mainly in the serial nature of global variable updating and communication, the core idea of our parallelization is that in massive text training, global delay updating and asynchronous communication will not affect the final training results. Experiments show that the proposed algorithm has greatly improved the training efficiency. The classification accuracy is higher than that of Naive Bayesian algorithm and Support Vector Machine (SVM) algorithm proposed in other literatures. The average classification accuracy can achieve at about 95%, and it can be used as a general parallel framework of supervised LDA algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.