Abstract

ABSTRACTTremendous increase in user-generated content (UGC) published over the web in the form of natural language has posed a formidable challenge to automated information extraction (IE) and content analysis (CA). Techniques based on tree kernels (TK) have been successfully used for modelling semantic compositionality in many natural language processing (NLP) applications. Essentially, these techniques obtain the similarity of two production rules based on exact string comparison between the peer nodes. However, semantically identical tree fragments are forbidden even though they can contribute to the similarity of two trees. A mechanism needs to be addressed that accounts for the similarity of rules with varied syntax and vocabulary holding knowledge that are relatively analogous. In this paper, a hierarchical framework based on document object model (DOM) tree and linguistic kernels that jointly address subjectivity detection, opinion extraction and polarity classification is addressed. The model proceeds in three stages: during first stage, the contents of each DOM tree node is analysed to estimate the complexity of vocabulary and syntax using readability test. In second stage, the semantic tree kernels extended with word embeddings are used to classify nodes containing subjective and objective content. Finally, the content returned to be subjective is further examined for opinion polarity classification using fine-grained linguistic kernels. The efficiency of the proposed model is demonstrated through a series of experiments being conducted. The results reveal that the proposed polarity-enriched tree kernel (PETK) results in better prediction performance compared to the conventional tree kernels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call