Corpora-based language studies is a widespread practice in modern linguistics. In our study, we address so-called mass literature (or paraliterature) via a corpus of texts. The standardization of mass literature allows us to describe its genres by applying literary formulas. In brief, a formula serves for the embodiment of cultural themes and stereotypes in a universal form. While mass literature is a common subject of literary and cultural studies, from a linguistic point of view, literary formulas have not been studied well enough. We suggest that differences between the microgenres of paraliterature may be in syntax as well as in the vocabulary. Our work is based on the mass literature corpora and provides analysis of verb constructions characteristic of microgenres (love story, detective story, science fiction novel, and fantasy). In order to identify the distinctive features of mass literature microgenres, we have conducted a series of machine learning experiments. As a dataset, we compiled a corpus of 1,200 texts belonging to four microgenres. In our previous studies we showed that statistical features (text length, sentence length, and parts of speech frequencies) were insufficient for successful classification. The usage of lexical features has improved the quality of machine learning, however, the classifier based on syntactic features has shown the best results. 81 constructions have been selected as the most important features of syntactic-based machine learning. A verb construction consists of a “verb + dependencies”. We consider several types of dependencies: arguments (subject, direct and indirect objects) and adjuncts (subordinate clause and adverbial classifier). The constructions are described in terms of the fullness of the verb valencies. Several types of constructions are distinguished: complete, incomplete (with omissions of direct or indirect complements), and extended (with adjuncts). Specific examples of verb constructions in each of the microgenres have been analyzed and described. Based on the similarity of the construction profile, romance novels and detective stories show the predominance of verbs and direct speech constructions, whereas science fiction novels and fantasy demonstrate the prevalence of full constructions. Possible non-textual explanations for our observations are also provided. The method proposed in the study can be used for the investigation of various kinds of syntactic features, for instance, those associated with the author’s, temporal or genre specificities.
Read full abstract