Abstract

In every field of scientific enquiry, there is much data and therefore frequent cause to turn to the computer to help process it. This is certainly true of linguists. They use computers to search for examples of grammatical phenomena in large corpora and to collect statistics on their occurrence. They can use them to compile lexica, and to compare them with a view to assessing the relatedness of pairs of languages. Activities like these are collectively referred to as Natural Language Processing (NLP). Generally speaking, however, NLP is an engineering, rather than a scientific enterprise, much of it devoted to developing technologies, like machine translation, information retrieval, and speech recognition. It would be natural to expect these technological developments to be informed by the results of scientific enquiry carried out by linguists. In other words, it would be natural that they should have a foundation in computational linguistics. But this is rarely the case. Technological development in NLP is based almost entirely on machine-learning models most of which are wild and fantastical from a linguist’s perspective. This, of course, is an aberration which, fortunately, may be in the course of correction.
 In a tightly argued and largely convincing essay elsewhere in this volume, Steven Abney expresses a different view. “Computational linguistics”, he writes, “is not a specialization of linguistics at all, at least not if we take “linguistics” and “computational linguistics” as academic communities defined by their membership.” An academic community is a set of people and a set is surely defined by its membership, but sets do not confer on their members the right to appropriate names already long since claimed by the members of other sets. In this paper, I shall continue to use the term “Computational Linguistics” to refer to an approach to the subject of linguistics that is informed and inspired by computing. With Abney, I shall argue in this paper that “Language is a computational system, and there is a depth of understanding that is simply unachievable without a thorough knowledge of computation.” There is a natural affinity between linguistics and computer science, and it is one that has very little to do with NLP. It arises because human language is one of very few naturally occurring phenomena that is fundamentally digital. Linguists and lay people alike tacitly acknowledge this affinity when they discuss such questions as whether spider is an insect, whether the vowel in “marry” is the same as the one in “merry”, or whether I can claim simultaneously that “I heard about the argument in the library” while denying the truth of both “I was in the library” and “The argument was in the library”. Notice that, while a spider may be more or less like an insect, it cannot be more or less an insect. Either it is, or it is not. Likewise with the vowels in “marry” and “merry”. They may sound more or less different in the speech of different people, but the vowels of a particular English speaker’s language constitute a small, fixed set and, in a given dialect, the vowels in these words are instances either of the same, or different members of that set. The sentence about the argument and the library has (at least) two syntactic structures, one of which puts me, and one which puts the argument, in the library. Language places the phenomena in its purview into absolutely discrete classes, and this is what makes it a digital system.

Highlights

  • Human language makes contact with the world at two places

  • The two are connected by processes that are essentially and crucially digital in nature and which lie within the purview of computer science

  • Many linguists share a measure of insecurity about the true status of their discipline as a science, an insecurity having to do partly with the nature of the data that the discipline rests on, and partly with the fact that linguistics has had difficulty fully embracing the experimental methods that characterize other sciences

Read more

Summary

Introduction

Human language makes contact with the world at two places. It is used to talk about things in the world like objects, abstractions, thoughts, beliefs, facts, and fictions. Language makes itself available to the senses through sounds, symbols, articulatory gestures, and marks on paper, which are part of the world but which have no necessary connection to the first set of things This is the other point of contact. Following the Swiss linguist Ferdinand de Saussure (de Saussure, 1915), the Americans embraced structuralism, insisting that the most important facts about a language concern the relationships that elements of the language—sounds, words, affixes, phrases—contract with one another, rather than with things in the outside world. They would concentrate on just one of the two points of contact. Since the concern was with what speakers know, and with what they do, it had to countenance mental entities and this required an abrupt change in the thinking of the linguists who decided to go in this new direction

Language Data
Digital Systems
Processes
Machine Learning
Zipf ’s Law
Experimental Linguistics
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call