
Most previous linguistic studies of web language have focused on the ‘new’ internet registers, like blogs, instant messages, and tweets. As a result, we know surprisingly little about the patterns of linguistic variation among the full range of registers found on the searchable web. The present paper provides an overview of a project that begins to fill this gap. Rather than collecting texts from only the ‘new’ web registers, the project is based on a large corpus representing a random sample of the entire searchable web. The first analytical step in the project was to analyze the types of documents found in that corpus, providing an empirical description of the composition of the searchable web. Then, Multi-Dimensional (MD) analysis was applied to describe the patterns of register variation found on the searchable web. The MD analysis first identified the sets of co-occurring linguistic features -- the ‘dimensions’ -- in this discourse domain. Then, those dimensions are used to document the similarities and differences among web registers. In conclusion, we compare our results here to previous MD studies, identifying patterns peculiar to the web versus linguistic patterns found across discourse domains.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call