Abstract
In this article we discuss the use of big corpuses or databases as a first step for qualitative analysis of linguistic data. We concentrate on ASIt, the Syntactic Atlas of Italy, and take into consideration the different types of dialectal data that can be collected from similar corpora and databases. We analyze all the methodological problems derived from the necessary compromise between the strict requirements imposed by a scientific inquiry and the management of big amounts of data. As a possible solution, we propose that the type of variation is per se a tool to derive meaningful generalizations. To implement this idea, we examine three different types of variation patterns that can be used in the study of morpho-syntax: the geographical distribution of properties (and their total or partial overlapping, or complementary distribution), the so-called leopard spots variation, and the lexical variation index, which can be used to determine the internal complexity of functional items.
Highlights
In this paper we discuss different distributional patterns that emerge taking into consideration linguistic, and in particular syntactic atlases and databases
We examine three different types of variation patterns that can be used in the study of morpho-syntax: the geographical distribution of properties, the so-called leopard spots variation, and the lexical variation index, which can be used to determine the internal complexity of functional items
3 Concluding remarks We conclude our methodological overview of different methodologies to treat big sets of data by noticing that big amount of data is often noisier than smaller set of data, where we can control for our experiment in a much more precise way, all data contain a certain amount of noise, and it is easier to filter it when there are more
Summary
In this paper we discuss different distributional patterns that emerge taking into consideration linguistic, and in particular syntactic atlases and databases. We will start by showing that big linguistic enterprises, like databases, atlases like the ASIt and all sorts of corpora always contain a certain amount of “noise”. They are by definition always incomplete when the hypothesis we want to test is very detailed. We believe that they can be interesting In this way, using big amounts of data mining can nicely complement our introspective type of empirical evidence. An innovative way to think about big data sets and tailor our questions on the linguistic evidence provided by big amounts of data is to consider the type of variation itself as a clue indicating different natural classes of linguistic phenomena.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.