Lexicographic Tools Research Articles

Lexical collocations have particular statistical distributions. We have developed a set of statistical techniques for retrieving and identifying collocations from large textual corpora. The techniques we developed are able to identify collocations of arbitrary length as well as flexible collocations. These techniques have been implemented in a lexicographic tool, Xtract, which is able to automatically acquire collocations with high retrieval performance. Xtract works in three stages. The first stage is based on a statistical technique for identifying word pairs involved in a syntactic relation. The words can appear in the text in any order and can be separated by an arbitrary number of other words. The second stage is based on a technique to extract n-word collocations (or n-grams) in a much simpler way than related methods. These collocations can involve closed class words such as particles and prepositions. A third stage is then applied to the output of stage one and applies parsing techniques to sentences involving a given word pair in order to identify the proper syntactic relation between the two words. A secondary effect of the third stage is to filter out a number of candidate collocations as irrelevant and thus produce higher quality output. In this paper we present an overview of Xtract and we describe several uses for Xtract and the knowledge it retrieves such as language generation and machine translation.

Read full abstract

Any major dictionary needs to be continuously updated to keep current with the rapid growth of language. Faced with the need for such endless supplementation, Oxford University Press, in considering the available choices, concluded that automating the Oxford English Dictionary offered the only practicable solution. Traditional “cut‐and‐paste” methods of revision were ruled out as inadequate to meet the requirements of a task of such staggering proportions. But even if Oxford University Press had not planned to revise or enhance the Oxford English Dictionary, automation would still have brought enormous benefits to those who make extraordinary demands on dictionaries, for a traditional printed dictionary is a very unsatisfying reference tool in many ways, serving only if its conditions of strict lineality are accepted. If, on the other hand, an unreasonable demand of a dictionary is made, such as “Print out two carefully dated lists of all English adjectives ending in ‐ic and ‐ical,” then the traditional dictionary is about as useful as a sling‐shot or a royal command that the tides cease. Computerization, thus, can transform the linear reference tool into a versatile research tool with multiple points of access and provide scholars with a lexicographical research tool of unparalleled subtlety and power.

Read full abstract

Lexicographic Tools Research Articles

Articles published on Lexicographic Tools

Xtract: An overview

Using collocations for language generation1

The New Oxford English Dictionary and the Uses of Machine‐Readable Dictionaries

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Lexicographic Tools Research Articles

Articles published on Lexicographic Tools

Xtract: An overview

Using collocations for language generation1

The New Oxford English Dictionary and the Uses of Machine‐Readable Dictionaries