Opinion Content Extraction from Web Pages Using Embedded Semantic Term Tree Kernels

Veerappa B Pagi,Ramesh S Wadawadagi

doi:10.1007/978-981-10-6319-0_29

Abstract

Rapid proliferation of user-generated content (UGC) published over the Web in the form of natural language has made the task of automatic Information Extraction (IE) a challenging issue. Despite numerous models proposed in the literature to address Web IE issues, still there is a growing demand for researchers to develop novel techniques to cope up with new challenges. In this paper, an approach to extract opinion content from Web pages using Embedded Semantic Term Tree Kernels (ESTTK) is addressed. In traditional tree kernels, the similarity of any two given production rules is determined based on exact string comparison between the peer nodes in the rules. However, semantically identical tree fragments are forbidden, even they can contribute to the similarity of two trees. A mechanism needs to be addressed, which accounts for the similarity of nodes with different vocabulary and phrases holding knowledge that are relatively analogous. Hence, the primitive tree kernel function is reconstructed to obtain the similarity of nodes by searching keywords in opinion lexicon embedded as vectors. Experimental results reveal that ESTTK results in better prediction performance compared to the conventional tree kernels.

Full Text