Abstract

This paper presents a method for discriminating personal opinions from commercial speech. Today many personal opinions such as complaints about a particular product can be found on the Web. Mining useful information from these opinions is important for a wide range of applications. Personal opinions on the web, however, are often contaminated with commercial speech. Commercial speech is generated by companies and individuals for the intent of making a profit. The data cleaning process that discriminates personal opinions from commercial speech is important for obtaining useful results in opinion mining. As a data cleaning method, we propose a language-independent method for discriminating personal opinions from commercial speech based on subjectivity of text. Assuming that subjective words frequently occur in personal opinions rather than commercial speech, we define the subjectivity score of each word. Estimating the total subjectivity score of a given text, the proposed method identifies whether the given text expresses personal opinions or commercial comments. From experiments using texts datasets written in different languages, we have found over 90% of personal opinions can be correctly discriminated from commercial speech by the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call