Source models for natural language text

Ian H Witten,Timothy C Bell

doi:10.1016/s0020-7373(05)80033-1

Abstract

A model of a natural language text is a collection of information that approximates the statistics and structure of the text being modeled. The purpose of the model may be to give insight into rules which govern how language is generated, or to predict properties of future samples of it. This paper studies models of natural language from three different, but related, viewpoints. First, we examine the statistical regularities that are found empirically, based on the natural units of words and letters. Second, we study theoretical models of language, including simple random generative models of letters and words whose output, like genuine natural language, obeys Zipf's law. Innovation in text is also considered by modeling the appearance of previously unseen words as a Poisson process. Finally, we review experiments that estimate the information content inherent in natural text.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Source models for natural language text

Abstract

Talk to us

Similar Papers

More From: International Journal of Man-Machine Studies

Lead the way for us

Journal: International Journal of Man-Machine Studies	Publication Date: May 1, 1990
Citations: 32

Similar Papers

Detecting Emotion from Natural Language Text Using Hybrid and NLP Pre-trained Models

Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 12

28 Apr 2021
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 12

Numerical Analysis of Word Frequencies in Artificial and Natural Language Texts
A Cohen ... S Havlin
Fractals | VOL. 05
A Cohen, et. al.A Cohen ... S Havlin
01 Mar 1997
Fractals | VOL. 05

Extraction of key attributes from natural language requirements specification text
S Geetha ... G.S.A Mala
-
S Geetha, et. al.S Geetha ... G.S.A Mala
01 Jan 2013
01 Jan 2013

A Universal Lexical Steganography Technique
Ahmad Alabish ... Anes Enakoa
International Journal of Computer and Communication Engineering | VOL. -
Ahmad Alabish, et. al.Ahmad Alabish ... Anes Enakoa
01 Jan 2013
International Journal of Computer and Communication Engineering | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Source models for natural language text

Abstract

Talk to us

Similar Papers

More From: International Journal of Man-Machine Studies