Abstract

Cross-Language Information Retrieval (CLIR) is responsible for retrieving information stored in a language different from the language of the query provided by the user. Some translation methods commonly used in CLIR are Dictionary, Parallel corpora, Comparable corpora, Machine translator, Ontology, and Transitive-based. The query must be translated to the target language, followed by preprocessing and calculating the similarity between the query and all documents in the corpus. The problem is the time and accuracy of query translation. Moreover, the queries are not written as complete sentences according to certain language rules. Stemming, for example, every language has its own method. Indonesian has basic words and affixes in the form of prefixes, suffixes, infixes, and confixes, while English only has suffixes. Stemming takes a long time in text processing. In the Indonesian search engine (SEBI), the provision of cross-language tourism news retrieval is realized using the Google Translate API, which translates the Query and all documents into English, Porter's stemming technique to convert each term to its general form, and cosine similarity to calculate similarity. This approach can deliver cross-language tourism news instantly while increasing the precision and efficiency of the SEBI search engine, although some improvements are needed to provide a more precise and efficient similarity computation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call