The goal of the current study is to explore the possibility of correctly classifying movie transcripts into movie genres by means of a Discriminant Function Analysis (DFA) based on a previous comprehensive multidimensional (MD) analysis of American cinema. MD analysis is a framework for describing the salient characteristics of text varieties by means of multivariate statistical techniques, notably factor analysis. Traditionally, MD analysis has been restricted to the study of register variation, being largely ignored in text classification research. In the MD analysis reported, a large genre-diversified movie corpus was tagged for lexico-grammatical features with the Biber tagger and the resulting factor scores were used as input for the DFA. The results showed that particular movie genres could be successfully predicted from the MD analysis, thereby lending credence to movie genre distinctions, while at the same time stressing the robustness of MD factor scores as reliable predictors of genre distinctions.
Read full abstract