Abstract

In this paper, we describe various language models (LMs) and combinations created to support word prediction and completion in Hebrew. We define and apply 5 general types of LMs: (1) Basic LMs (unigrams, bigrams, trigrams, and quadgrams), (2) Backoff LMs, (3) LMs Integrated with tagged LMs, (4) Interpolated LMs, and (5) Interpolated LMs Integrated with tagged LMs. 16 specific implementations of these LMs were compared using 3 types of Israeli web newspaper corpora. The foremost keystroke saving results were achieved with LMs of the most complex variety, the Interpolated LMs Integrated with tagged LMs. Therefore, we conclude that combining all strengths by creating a synthesis of all four basic LMs and the tagged LMs leads to the best results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call