Abstract

Social network services for self-media, such as Weibo, Blog, and WeChat Public, constitute a powerful medium that allows users to publish posts every day. Due to insufficient information transparency, malicious marketing of the Internet from self-media posts imposes potential harm on society. Therefore, it is necessary to identify news with marketing intentions for life. We follow the idea of text classification to identify marketing intentions. Although there are some current methods to address intention detection, the challenge is how the feature extraction of text reflects semantic information and how to improve the time complexity and space complexity of the recognition model. To this end, this paper proposes a machine learning method to identify marketing intentions from large-scale We-Media data. First, the proposed Latent Semantic Analysis (LSI)-Word2vec model can reflect the semantic features. Second, the decision tree model is simplified by decision tree pruning to save computing resources and reduce the time complexity. Finally, this paper examines the effects of classifier associations and uses the optimal configuration to help people efficiently identify marketing intention. Finally, the detailed experimental evaluation on several metrics shows that our approaches are effective and efficient. The F1 value can be increased by about 5%, and the running time is increased by 20%, which prove that the newly-proposed method can effectively improve the accuracy of marketing news recognition.

Highlights

  • BackgroundNew media are forms of media that are native to computers and mobile phones for redistribution

  • This paper has proposed an efficient ensemble learning method to identify marketing intention, which is used to find the marketing news on the Internet and provide a reference value for identifying marketing news

  • Latent Semantic Analysis (LSI) stands for the text set as an m × n-dimensional matrix |X|, where m is the size of the dictionary, n is the number of texts, and the element (i, j) of the matrix is the rate of the ith word in the jth text

Read more

Summary

Background

New media are forms of media that are native to computers and mobile phones for redistribution. Some examples of new media are virtual worlds, social networking, Internet-oriented websites, and self-media platforms [1]. The algorithm needs to train the model according to the news content that has been marked with the correct category and use the model to classify the news of unknown categories automatically. Against this background, this paper has proposed an efficient ensemble learning method to identify marketing intention, which is used to find the marketing news on the Internet and provide a reference value for identifying marketing news

Challenges
Contributions
Organization
Feature Extraction
Text Classification
Approach
Preprocessing
LSI-Word2vec
Extraction Processing
Stacking
Experimental Setup
Experimental Parameters
Metrics
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.