Abstract
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.
Highlights
In the age of social media, companies can no longer rely on advertisements or press releases to reach out to their customers
As our study focuses on classifying the target audience from the list of followers, it is of interest to analyse if an unsupervised topic modelling approach, such as Latent Dirichlet Allocation (LDA) [5], can be used to discover topics or domains of interest of the followers and construct negative training datasets based on the domains uncovered
While the 10 fold cross-validation results can be a good indicator of the performance of the various classifiers, it is of interest to assess how the various Support Vector Machine (SVM) ensembles perform on actual followers’ tweets or previously unseen datasets to assess the ability and the potential of the classifiers to predict or identify a target audience from the list of followers
Summary
In the age of social media, companies can no longer rely on advertisements or press releases to reach out to their customers. Even though the 1.28 billion active user base [3] of Twitter, Facebook and other social media platforms can be a valuable source of information for any business, it is not an easy feat to identify a target audience in the crowded social media space. This is mainly because of the challenge of extracting commercially viable contents from the vast amount of free-form conversations. Automated machine learning approaches that can help in classifying and PLOS ONE | DOI:10.1371/journal.pone.0122855 April 13, 2015
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.