A Modified Cosine Similarity for Cross Language Information Retrieval

Chatchai Inparaprapan,Kraisak Kesorn

doi:10.4028/www.scientific.net/amr.931-932.1348

Chatchai Inparaprapan, Kraisak Kesorn

https://doi.org/10.4028/www.scientific.net/amr.931-932.1348

Copy DOI

Abstract

Since millions of documents are available on the Internet, some documents contain similar content but they are written in different languages by various authors. Unfortunately, the existing search engines do not support to all documents that are relevant to a single language query. Therefore, several researchers have put a huge effort to overcome such a problem. The major problems of a cross language search engine include 1) how to store information in a unify model and represent information of multiple languages documents effectively and 2) how to rank the retrieved multiple language documents and present to a user in the right order. This paper overcomes the first problem using an ontology model and we present a new ranking technique for a cross language information retrieval system (CLIR). Keyword weighting scheme in an ontology and document sections are introduced. Cosine similarity formula is modified to particularly support CLIR. The experimental results show the modified formula obtains more efficient ranking results than the existing method.

Full Text