Abstract

ABSTRACTThis study was conducted to provide an up‐to‐date source of word frequency information based on the kinds of reading materials to which high school and first‐year college students are exposed. It began with a comprehensive listing of reading materials from curriculum surveys, state curriculum guides, private school reading lists, research surveys, federal reports, recommended reading lists, and other sources. Materials mentioned most often were sampled or entire documents were obtained when they were available in electronic form. Included in the sample of reading materials were American and British novels, poetry, drama, essays, biographies, autobiographies, current periodicals of various types, historical documents, and text from an encyclopedia.A corpus of 14,360,884 words of running text was assembled. This corpus was analyzed using the most sophisticated lexicographic methods available and the following statistics were generated: the overall frequency of occurrence of each word in the corpus, an index of dispersion for each word over 27 text categories, an estimate of the number of occurrences per one million words of running text for each word that would be expected in a similar but different corpus, and a standard frequency index developed from a logarithmic transformation.This report describes the development of the corpus and the computation of the word frequency indexes. It also compares the corpus with other existing corpora and demonstrates the importance of up‐to‐date word frequency information. The comprehensive listing of reading materials examined and a list of sampled materials are included in the Appendixes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.