Abstract

Abstract Part-of-speech tagging is a process to apply word class of a word in texts. POS Tagger for specific language is usually built with generic domain corpus, for example using text from newspaper. If this POS Tagger tested against word from new domain or another specific domain, then the POS Tagger can possibly word class inaccurately. Solving specific domain adaptation can be done by using several methods, using clustering to change word representation or using model with big number of lexicon and using labelled texts from specific domain for training the model. In this research we apply domain adaptation method by using additional lexicon that built based on affix rule. Specific domain used is beauty product domain. Component for this system is a POS Tagger with generic domain and unlabeled lexicon from target domain. Word class in target domain lexicon applied based on affix information and the remains labelled manually. Based on observation to the dataset, words in English was often to be used, so the lexicon developed in Indonesian and English. The processed lexicon added in lexicon from original POS Tagger to give specific domain information to the POS Tagger with generic domain. The POS tags focused in this study are noun, proper noun, adjective and adverb because results from this POS Tagger are used for aspect and opinion extraction. Tagger with added lexicon achieve 68.99% accuracy and the percentage of words that are successfully recognized by tagger is 92.36%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.