Abstract

One of the biggest desiderata in practical lexicography is the labeling of lexemes by region in the widest sense of that word. In a highly mobile world, regional labeling is bound to receive more attention, yet is perhaps the least precise aspect of English dictionaries more generally. Largely comprising two groups—national terms, such as Americanism or Briticism, and regional terms of a certain more local provenance, such as Southwestern Ontario or Scottish—regional labeling in English dictionaries suffers from both a theoretical neglect and a practical lack of adequate data for dictionary editors to use easily in assessing a term’s geographical dimensions. This paper describes a method developed for the second edition of the Dictionary of Canadianisms on Historical Principles (DCHP-2). Using site-restricted web searches in combination with long-term web monitoring, the method rests on a normalization routine that produces “Frequency Indices” that are comparable across domains. Counter to recent lexicographic best practices, it is shown that web-scaled resources—generally preferred by computational linguists and computational lexicographers as both clean and large—are not yet adequate for this task. For better or worse, the unfiltered, messy web, when used with routines and heuristic reasoning that are checked with certain fail-safes, offers the best chance for attaining geographical information on large numbers of items that would otherwise be labeled subjectively or not at all.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call