Computer application of a syntactic density measure

Carole L Kidder,Lester S Golub

doi:10.1007/bf02402555

Abstract

complexity of syntactic structures in different levels of graded reading materials and in the oral and written language of children at different levels of development. Research in language education and in language development can be facilitated by using the computer to analyze and measure syntactic density (also called syntactic complexity or syntactic maturity by some researchers). This paper explains one computerized program for measuring this quality of language. One of the characteristics of language development in children is that with increasing maturity and language development, they use increasingly more complex structures. Although language acquisition is most rapid in the preschool period, research has shown (Hunt, 1965, 1970; Loban, 1963, 1970; O'Donnell, Griffin, and Norris, 1967) that students of elementary and secondary school ages continue to develop their abilities to manipulate language by employing increasing numbers of complex syntactic structures. Much of the recent research in syntactic development (Hunt, 1970; Loban, 1970; Golub and Frederick, 1971) has been aimed at discovering, describing, and specifying those characteristics of syntax that distinguish degrees of complexity of syntax. Through a series of studies of children's oral and written discourse, Golub (1974) has developed a Syntactic Density instrument to tabulate the occurrences of specific linguistic structures that correlate with teachers' judgments of writing samples. In an early stage of the study, sixty-three linguistic variables were listed, from which multivariate analysis isolated the ten that most highly correlated with teachers' high ratings, and canonical correlation assigned a relative weight to each variable according to the degree of its contribution to "syntactic density." When the variables are counted and weighted, the products are added and the total is divided by the number of T-units in the sample to arrive at a single syntactic density score, which is printed out on a Tabulation Sheet. (A T-unit is a main clause with all of its subordinate clauses.) The variables included in Golub's formula reflect structures that have been identified in linguistic theory as being complex structures. Measures of mean main clause length and mean subordinate clause length are combined with measures of these other types of complexities. Golub's formula not only incorporates the measures of T-unit length and subordinate clause length that Hunt and others have found useful, but also reflects complex verb expansions, use of some advanced structures of time, and reductions or embeddings that take the form of prepositional phrases. A PL/l program has been written by Kidder to apply the formula to samples of natural language. Encoding Conventions for Data. Text to be analyzed by the computer program must be prepared in blank-delimited form in columns 1 to 72. This means that each word and syntactic punctuation mark must be preceded and followed by at least one blank. Lexical punctuation, as in hyphenated words or in abbreviations, is not by blanks separated from its associated character string. Multiple blanks are ignored. Quotation marks surrounding conversation may be omitted. At the end of each paragraph, the final mark of punctuation must be doubled. To separate samples, three dollar signs ($$$) must appear at the end of each in columns 1 to 3.

Full Text