- Research Article
- 10.15398/jlm.v13i2.437
- Sep 15, 2025
- Journal of Language Modelling
- Mojca Brglez + 1 more
Phrases such as burning question, digital waste or invasion of technology are relatively ordinary expressions understood by any speaker of English. While diverse in structure and meaning, they demonstrate a semantic tension between the basic meanings of a metaphoric and a non-metaphoric constituent. Albeit frequent in discourse, they remain a challenge for automatic language processing systems, especially for smaller, less represented languages. In this work, we inspect a broad array of language models to embed noun phrases in Slovene and investigate the potential of word embeddings to identify metaphoric phrases via the semantic distance of its constituents as measured via cosine similarity. The study shows both static and contextual monolingual embeddings encode relevant semantic information while multilingual embeddings demonstrate no significant effect in this experimental setting. Moreover, the study unravels the most effective layers for basic meaning representation and highlights the influence of other, non-semantic factors on cosine similarity. By shedding light on these mechanisms, the study provides new insights for both metaphor processing and our understanding of the inner workings of language models.
- Research Article
- 10.15398/jlm.v13i1.339
- Jun 30, 2025
- Journal of Language Modelling
- Yusuke Kubota + 1 more
This paper proposes a novel analysis of extraction pathway marking in Type-Logical Grammar, taking advantage of proof-theoretic properties of logical proofs whose empirical application has so far been underexplored. The key idea is to allow certain linguistic expressions to be sensitive to the intermediate status of a syntactic proof. The relevant conditions can be stated concisely as constraints at the level of the proof term language, formally a special type of λ-calculus. The proposed analysis does not have any direct analog to either of the two familiar techniques for analyzing extraction pathway marking, namely, successive cyclic movement in derivational syntax and the SLASH feature percolation in HPSG. Moreover, the ‘meaning-centered’ perspective that naturally emerges from this new analysis is conceptually revealing: on this approach, extraction pathway marking essentially boils down to a strategy that certain languages employ to overtly flag the existence of a semantic variable inside a partially derived linguistic expression whose interpretation is dependent on a higher-order operator that is located in a larger structure.
- Research Article
- 10.15398/jlm.v13i1.411
- Jun 9, 2025
- Journal of Language Modelling
- Kenneth Hanson
This paper presents a subregular analysis of syntactic agreement patterns modeled using command strings over Minimalist Grammar (MG) dependency trees (Graf and Shafiei 2019), incorporating a novel MG treatment of agreement. Phenomena of interest include relativized minimality and its exceptions, direction of feature transmission, and configurations involving chains of agreeing elements. Such patterns are shown to fall within the class of tier-based strictly 2-local (TSL-2) languages, which has previously been argued to subsume the majority of long-distance syntactic phenomena, as well as those in phonology and morphology (Graf 2022a). This characterization places a tight upper bound on the range of configurations that are predicted to occur while providing parameters for variation which closely match the observed typology.
- Research Article
- 10.15398/jlm.v13i1.397
- Jan 30, 2025
- Journal of Language Modelling
- Tina Bögel + 1 more
This paper presents a new computational implementation bridging several modules of grammar from phonetics to phonology to syntax. The system takes as input a speech signal annotated with syllables, interprets the phonetic data in phonological/prosodic terms, matches the data against a lexicon and makes the results available to a linguistically deep computational grammar. The system is showcased by means of syntactically ambiguous structures in German which can be disambiguated based on prosodic constituency information. A system evaluation with the German data showed good results for this new combination of automatic speech signal analysis and computational grammars, which takes a significant step towards a linguistically fine-grained computational analysis and hence towards real automatic speech understanding.
- Research Article
- 10.15398/jlm.v12i2.431
- Dec 10, 2024
- Journal of Language Modelling
- Micha Elsner + 1 more
Introduction to the Special Issue.
- Research Article
- 10.15398/jlm.v12i2.352
- Dec 10, 2024
- Journal of Language Modelling
- Matías Guzmán Naranjo
This paper studies the inflectional complexity of nouns, verbs and adjectives in 137 datasets, across 71 languages. I follow Ackerman and Malouf (2013) in distinguishing between E(numerative) complexity and I(ntegrative) complexity. The first one encompasses aspects of inflection, like the number of principal parts, paradigm size, and number of exponents, while the second one captures the implicative relations between paradigm cells (how difficult it is to predict one cell of a paradigm knowing a different cell). I provide a formalism and computational implementation to estimate both I- and E-complexity expressed through Word and Paradigm morphology (Blevins 2006, 2016), which is flexible and powerful enough for typological research. The results show that, as suggested by Ackerman and Malouf (2013), I-complexity is relatively low across the languages in the sample, with only two clear exceptions (Navajo and Yaitepec-Chatino). The results also show that E-complexity can vary considerably crosslinguistically. Finally, I show there is a clear correlation between I- and E-complexity.
- Research Article
- 10.15398/jlm.v12i2.351
- Dec 10, 2024
- Journal of Language Modelling
- Coleman Haley + 2 more
In morphology, a distinction is commonly drawn between inflection and derivation. However, a precise definition of this distinction which reflects the way it manifests across languages remains elusive within linguistic theory, typically being based on subjective tests. In this study, we present 4 quantitative measures which use the statistics of a raw text corpus in a language to estimate to what extent a given morphological construction changes the form and distribution of lexemes. In particular, we measure both the average and the variance of this change across lexemes. Crucially, distributional information captures syntactic and semantic properties and can be operationalised by word embeddings. Based on a sample of 26 languages, we find that we can reconstruct 89±1% of the classification of constructions into inflection and derivation in UniMorph using our 4 measures, providing large-scale cross-linguistic evidence that the concepts of inflection and derivation are associated with measurable signatures in terms of form and distribution that behave consistently across a variety of languages. We also use our measures to identify in a quantitative way whether categories of inflection which have been considered noncanonical in the linguistic literature, such as inherent inflection or transpositions, appear so in terms of properties of their form and distribution. We find that while combining multiple measures reduces the amount of overlap between inflectional and derivational constructions, there are still many constructions near the model’s decision boundary between the two categories. This indicates a gradient, rather than categorical, distinction.
- Research Article
- 10.15398/jlm.v12i2.361
- Dec 10, 2024
- Journal of Language Modelling
- Laura Becker
This study examines zero marking, i.e. the absence of an overt exponent, in adjectival, nominal, and verbal inflectional morphology across languages. The first part of the study provides an overview of the distribution of zero markers in inflection paradigms using the UniMorph dataset. The results show that there is a general preference against zero marking. The distribution of zero markers varies to a great extent across languages and lemmas, the only robust trend being that they are avoided in cells that express a high number of grammatical values. The second part of this study examines the association between marker frequencies and phonological length, using the Universal Dependencies treebanks. While token frequency is a good predictor for the length of overt markers, it does not account for the occurrence of zero markers. This is taken as evidence to support a differential non-development scenario of zero marking rather than a phonetic reduction scenario.
- Research Article
1
- 10.15398/jlm.v12i2.360
- Dec 10, 2024
- Journal of Language Modelling
- David Inman + 3 more
This article presents the structure of the ATLAs Alignment Module, a typological database designed to exhaustively capture languageinternal variation in argument marking (indexing and flagging). The flexible design of our database can be extended to cover further aspects of morphosyntactic alignment. We demonstrate with a small diversity sample how the database can be queried and the data aggregated at different levels of structure (e.g. for a language as a whole or for individual referential types in the form of alignment statements) for the purposes of cross-linguistic comparison. The database is made available in the Cross-Linguistic Data Formats (CLDF), and we provide code that generates an array of aggregations.
- Research Article
- 10.15398/jlm.v12i1.365
- Jun 14, 2024
- Journal of Language Modelling
- Chit-Fung Lam
This paper proposes a formal analysis of two displacement phenomena in Mandarin Chinese, namely inner topicalisation and focus fronting, capturing their correlational relationships with control and complementation. It examines a range of relevant data, including corpus examples, to derive empirical generalisations. Acceptability-judgment tasks, followed by mixed-effects statistical models, were conducted to provide additional evidence. This paper presents a constraint-based lexicalist proposal that is couched in the framework of Lexical-Functional Grammar (LFG). The lexicon plays an important role in regulating the behaviour of complementation verbs as they participate in the displacement phenomena. Unlike previous analyses that cast inner topicalisation and focus fronting as restructuring phenomena, this lexicalist proposal does not rely on hypothesised clause-size differences. It captures the empirical properties more accurately and accounts for a wider range of empirical patterns. Adopting the formally explicit framework of LFG, this proposal uses constraints that have mathematical precision. The constraints are computationally implemented using the grammar engineering tool Xerox Linguistic Environment, safeguarding their precision.