Abstract

Consumers’ purchase decisions are increasingly influenced by user-generated online reviews. Accordingly, there has been growing concern about the potential for posting deceptive opinion spam— fictitious reviews that have been deliberately written to sound authentic, to deceive the reader. In this paper, we explore generalized approaches for identifying online deceptive opinion spam based on a new gold standard dataset, which is comprised of data from three different domains (i.e. Hotel, Restaurant, Doctor), each of which contains three types of reviews, i.e. customer generated truthful reviews, Turker generated deceptive reviews and employee (domain-expert) generated deceptive reviews. Our approach tries to capture the general difference of language usage between deceptive and truthful reviews, which we hope will help customers when making purchase decisions and review portal operators, such as TripAdvisor or Yelp, investigate possible fraudulent activity on their sites. 1

Highlights

  • Consumers increasingly rely on user-generated online reviews when making purchase decision (Cone, 2011; Ipsos, 2012)

  • Existing approaches for spam detection are usually focused on developing supervised learningbased algorithms to help users identify deceptive opinion spam, which are highly dependent upon high-quality gold-standard labeled data (Jindal and Liu, 2008; Jindal et al, 2010; Lim et al, 2010; Wang et al, 2011; Wu et al, 2010)

  • A couple of follow-up works have been introduced based on Ott et al.’s dataset, including estimating prevalence of deception in online reviews (Ott et al, 2012), identification of negative deceptive opinion spam (Ott et al, 2013), and identifying manipulated offerings (Li et al, 2013b)

Read more

Summary

Introduction

Consumers increasingly rely on user-generated online reviews when making purchase decision (Cone, 2011; Ipsos, 2012). Existing approaches for spam detection are usually focused on developing supervised learningbased algorithms to help users identify deceptive opinion spam, which are highly dependent upon high-quality gold-standard labeled data (Jindal and Liu, 2008; Jindal et al, 2010; Lim et al, 2010; Wang et al, 2011; Wu et al, 2010). Studies in the literature rely on a couple of approaches for obtaining labeled data, which usually fall into two categories. Recent studies show that deceptive opinion spam is not identified by human readers (Ott et al, 2011). A couple of follow-up works have been introduced based on Ott et al.’s dataset, including estimating prevalence of deception in online reviews (Ott et al, 2012), identification of negative deceptive opinion spam (Ott et al, 2013), and identifying manipulated offerings (Li et al, 2013b) As introduced by Ott et al (2011), crowdsourced deceptive reviews using Amazon Mechanical Turk. A couple of follow-up works have been introduced based on Ott et al.’s dataset, including estimating prevalence of deception in online reviews (Ott et al, 2012), identification of negative deceptive opinion spam (Ott et al, 2013), and identifying manipulated offerings (Li et al, 2013b)

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.