Abstract

The internet and the Web 2.0 gave rise to a wide variety of user generated content. This caused a massive growth in the amount and availability of opinionated information. This collection of complex, unstructured information is often referred as Big Data. A common practical application of such Big Data is social media sentiment analysis. The general aim of sentiment analysis is to determine/extract the opinion contained within a piece of text. A very active line of work focuses on the application of existing machine learning methods to sentiment analysis problems, for example support vector machine, which is a popular kernel method for text classification. This paper focuses on sequence kernels, which have been successfully employed for various natural language processing tasks including sentiment analysis. There have been developments in advanced methods for combining multiple information sources in a single kernel function - in particular the factored sequence kernel, which is a natural fit for text classification tasks, due to each element in a piece of text having other linguistic dimensions or factors. This paper proposes an extension of the gap-weighted soft-matching factored sequence kernel that is not only proportional to the number of factors considered but also proportional to the number of total matching factors. This allows the kernel to make a stronger distinction between composite sub sequences where only a single feature is matched and where more (or all) features are matched - that is not always possible through weighting. We make use of a tridimensional representation where each sentence is a composite sequence of its words, its part-of-speech tags, and its sentiment features. We evaluate the impact of the proposed methodology on two sentiment classification tasks - subjectivity and polarity classification. We perform a series of 10 fold cross validation experiments on two publicly available corpuses. Our experimental results show that our approach surpasses the original factored sequence kernel in almost every experiment, opening the way for future research on other extensions to the factored kernel.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.