Abstract

Online repositories are providing business opportunities to gain feedback and opinions on products and services in the form of digital deposits. Such deposits are, in turn, capable of influencing the readers’ views and behaviours from the posting of misinformation intended to deceive or manipulate. Establishing the veracity of these digital deposits could thus bring key benefits to both online businesses and internet users. Although machine learning techniques are well established for classifying text in terms of their content, techniques to categorise them in terms of their veracity remain a challenge for the domain of feature set extraction and analysis. To date, text categorisation techniques for veracity have reported a wide and inconsistent range of accuracies between 57 and 90 per cent. This article evaluates the accuracy of detecting online deceptive text using a logistic regression classifier based on part of speech tags extracted from a corpus of known truthful and deceptive statements. An accuracy of 72 per cent is achieved by reducing 42 extracted part of speech tags to a feature vector of six using principle component analysis. The results compare favourably to other studies. Improvements are anticipated by training machine learning algorithms on more complex feature vectors by combining the key features identified in this study with others from disparate feature domains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call