Turkish — English cross language information retrieval using LSI

Erbuğ Çelebi ,Burak Gunel,Baturman Sen

doi:10.1109/iscis.2009.5291896

Abstract

This paper describes a study of Turkish-English cross language information retrieval (CLIR) system. One of the biggest issues with CLIR studies is to access to bi-lingual parallel corpus. So, the first step of this study was to construct a parallel Turkish-English corpus. We have constructed a corpus that has 1801 parallel documents. The corpus has been divided in to two parts, first one for training the system and second one for testing the system. Latent semantic indexing (LSI) techniques applied to the training set to obtain the language relations. After the training, we have performed set of tests (queries) to measure the effectiveness of LSI based retrieval on Turkish-English parallel corpus. Our experimental results show that, LSI based CLIR outperforms the non-LSI based retrieval where their retrieval successes are %69 and %26 respectively.

Full Text