Abstract
As participants in the TIDES Surprise language exercise, researchers at the University of Massachusetts helped collect Hindi--English resources and developed a cross-language information retrieval system. Components included normalization, stop-word removal, transliteration, structured query translation, and language modeling using a probabilistic dictionary derived from a parallel corpus. Existing technology was successfully applied to Hindi. The biggest stumbling blocks were collection of parallel English and Hindi text and dealing with numerous proprietary encodings.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have