Abstract

Rapid proliferation of user-generated content (UGC) published over the Web in the form of natural language has made the task of automatic Information Extraction (IE) a challenging issue. Despite numerous models proposed in the literature to address Web IE issues, still there is a growing demand for researchers to develop novel techniques to cope up with new challenges. In this paper, an approach to extract opinion content from Web pages using Embedded Semantic Term Tree Kernels (ESTTK) is addressed. In traditional tree kernels, the similarity of any two given production rules is determined based on exact string comparison between the peer nodes in the rules. However, semantically identical tree fragments are forbidden, even they can contribute to the similarity of two trees. A mechanism needs to be addressed, which accounts for the similarity of nodes with different vocabulary and phrases holding knowledge that are relatively analogous. Hence, the primitive tree kernel function is reconstructed to obtain the similarity of nodes by searching keywords in opinion lexicon embedded as vectors. Experimental results reveal that ESTTK results in better prediction performance compared to the conventional tree kernels.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.