Abstract

The n-gram language model is powerful for treating natural spoken language, however it requires large amounts of spoken language corpus to estimate reliable model parameters. To estimate n-gram probabilities from sparse data, Katz's (1987) back-off smoothing method is promising. However, this approach is sometimes unstable because it uses singleton heuristics based on Turing's formula. This paper proposes a new back-off method based on binomial posteriori distribution of n-gram probabilities, which achieves stable and more effective n-gram smoothing using a sophisticated calculation formula with no heuristics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call