Abstract

Information retrieval (IR) systems are designed to help information seekers retrieving relevant information from vast document. The need for relevant information from a vast amount of document gave birth to IR systems. Even though different IR systems exist, they cannot meet all users’ expectations. A different level of users’ knowledge makes queries to be expressed in different ways. As a result, the system may miss the core meaning of users query and retrieve dissatisfactory results. This happens mainly because of the ambiguities of words involved in the natural languages and expression mismatch among users and authors. The existing ambiguities in Amharic language have negative impacts on the performance of Amharic IR system. Some of the ambiguities for this type of problem are: spelling variants of the same word, polysemous and synonymous terms. If users are not fully knowledgeable about the information domain area, they will mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an IR system. This research has been conducted, aiming at augmenting the recall of previous work. Statistical co-occurrence technique has been used in order to expand query terms. The main reason for performing query expansion is to provide relevant documents as per users’ query that can satisfy their information need. Statistical co-occurrence method considers, frequently appearing terms with the query term, regardless of their position. The efficiency of proposed technique has been tested on the prototype system and the result found compared with the result of previous study. Accordingly, 6% recall and 2% f-measure improvement has been made. Hence, the statistical co-occurrence method outperformed the bi-gram based IR system.

Highlights

  • As early as 1945 [1], Bush talked about a device in which an individual stores all his books, records, and communications, which has been mechanized so that it may be consulted with exceeding speed and flexibility

  • Bush [1] stated, “Instruments are at hand which, if properly developed, will give man access to and command over the inherited knowledge of the ages”. This system predefined by Bush, is an IR system, which is used in all aspects of information retrieval a days

  • An IR system goes through a series of steps, in order to provide seemingly relevant information to users. It clusters documents as relevant and non-relevant as per the query, ranks the relevant documents and displays them according to a certain similarity measurement [7]

Read more

Summary

Introduction

As early as 1945 [1], Bush talked about a device in which an individual stores all his books, records, and communications, which has been mechanized so that it may be consulted with exceeding speed and flexibility. An IR system goes through a series of steps, in order to provide seemingly relevant information to users It clusters documents as relevant and non-relevant as per the query, ranks the relevant documents and displays them according to a certain similarity measurement [7]. IR systems should comprise a more sophisticated technique and strategy that enables it, to cope up with the problem and have more of user-centered approach Such techniques and strategies would be, ontology based query expansion method, meaning oriented thesaurus usage, expanding the whole query with common expanding terms and strategies which seek meaning out of users’ queries. Even though many kinds of query expansion methods exist, their goal is the same; making users search task easy as much as possible, aiming at delivering information that satisfies users’ information need. To enhances the precision of the system so that more relevant documents can be found among the retrieved ones

Amharic Writing System and Ambiguities
Query Expansion Based on Statistical Co-Occurrence
Performance Evaluation
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.