Abstract

Word sense disambiguation is used in many natural language processing fields. One of the ways of disambiguation is the use of decision list algorithm which is a supervised method. Supervised methods are considered as the most accurate machine learning algorithms but they are strongly influenced by knowledge acquisition bottleneck which means that their efficiency depends on the size of the tagged training set, in which their preparation is difficult, time-consuming and costly. The proposed method in this article improves the efficiency of this algorithm where there is a small tagged training set. This method uses a statistical method for collocation extraction from a big untagged corpus. Thus, the more important collocations which are the features used for creation of learning hypotheses will be identified. Weighting the features improves the efficiency and accuracy of a decision list algorithm which has been trained with a small training corpus.

Highlights

  • There are some words in every language with multiple meanings and different applications that their meaning is determined based on the context in which they are placed

  • This article has focused on the subject of adverse impact of small size of semantic tagged corpus to remove the ambiguity of the meaning of homograph words in supervised methods

  • The amount of tagged data required in supervised methods in word sense disambiguation is much more than other tasks related to the field of machine learning

Read more

Summary

Introduction

There are some words in every language with multiple meanings and different applications that their meaning is determined based on the context in which they are placed. That is these words are vague words. Context can be a sentence or phrase. Disambiguation of the meaning of these words (WSD: Word Sense Disambiguation) is one of the research areas in the field of natural language processing and is used in Information Retrieval (IR), Machine Translation (MT), information extraction and documents classification. Ambiguous words are divided into two categories in terms of distinction level meaning.

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.