Applying Computational Linguistics and Language Models: From Descriptive Linguistics to Text Mining and Psycholinguistics

Gerold Schneider

doi:10.5167/uzh-108379

Abstract

This synopsis presents the application of computational linguistic tools and approaches which were developed by the author for Descriptive Linguistics, Text Mining, and Psycholinguistics. It also describes how the computational linguistic tools, which are originally based on linguistic insights and assumptions, lead to new and detailed linguistic insights if applied to different research areas, and can in turn again improve the computational tools. The computational tools are based on models of language, predicting part-of-speech tags or syntactic attachment. These models, which were originally designed for the practical purpose of solving a computational linguistics task, can increasingly be used as models of human language processing. A large-scale syntactic parser is the core linguistic tool that I am going to use. I further also employ its preprocessing tools, part-of-speech taggers and chunkers, and approaches learning from the data, so-called data- driven approaches. The use of syntactic parsing opens up a wide range of possibilities. In the first chapter, I summarise my applications of syntactic parsing, its preprocessing tools, and other computational linguistic approaches for the benefit of Descriptive Linguistics. I describe collocations, language variation, alternations, and language change. I will also describe the obvious advantage of an automatic approach: the sheer amount of data that can be processed, and the consistency, which can lead to the data-driven detection of new patterns. I also focus on the obvious disadvantage of using an automatic tool: that there is always a certain level of errors, which entails that evaluations are essential. In the second chapter I describe the application of the same tools for Biomedical Text Mining. I evaluate the performance of our approach and summarise insights from a linguistic perspective, leaving more technical aspects to the side. In the third chapter, I argue that a syntactic parser, in particular my approach which draws a clear division between competence and performance, can be used as a model to explore formulaic and creative language use, starting with Sinclair’s (1991) distinction between idiom principle and syntax principle, and ending with the suggestion to use the parser as a psycholinguistic model. This synopsis aims to summarise 16 publications and show the connections that hold between them.

Full Text