Abstract

This paper presents the sense clustering of multi-sense words in Afan Oromo. The main idea of this work is to cluster contexts which is providing a useful way to discover semantically related senses. The similar contexts of a given senses of target word are clustered using three hierarchical and two partitional clustering. All contexts of related senses are included in the clustering and thus performed over all the contexts in the corpus. The underlying hypothesis is that clustering captures the reflected unity among the contexts and each cluster reveal possible relationships existing among the contexts. As the experiment shows, from the total five clusters, the EM and K-Means clusters which yield significantly higher accuracy than hierarchical (single clustering, complete clustering and average clustering) result. For Afan Oromo, EM and K-means enhance the accuracy of sense clustering than hierarchical clustering algorithms. Each cluster representing a unique sense. Some words have two senses to the five senses. As the result shows an average accuracy of test set was 85.5% which is encouraging with the unsupervised machine learning work. By using this approach, finding the right number of clusters is equivalent to finding the number of senses. The achieved result was encouraging, despite it is less resource requirement.

Highlights

  • One of the most critical task in natural language processing (NLP) application is semantic

  • Given instances of a target word used in a number of different contexts, word sense disambiguation is the process of grouping these instances into clusters that refer to the same sense

  • The underlying hypothesis is that target word contexts clustering (Figure 2) captures the reflected unity among the contexts and each cluster reveal possible relationships existing among these contexts

Read more

Summary

Introduction

One of the most critical task in natural language processing (NLP) application is semantic. Given instances of a target word used in a number of different contexts, word sense disambiguation is the process of grouping these instances into clusters that refer to the same sense. Approaches to this problem are often based on the strong contextual hypothesis of [2], which states that two words are semantically related to the extent that their contextual representations are similar. The methodology of clustering contextually (and semantically) similar instances of text can be used in a variety of natural language processing tasks such as synonymy identification, text summarization and document classification. Sense Clusters has been used for applications such as email sorting and automatic ontology construction [5]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call