Abstract
We present a novel semantic framework for modeling linguistic expressions of generalization— generic, habitual, and episodic statements—as combinations of simple, real-valued referential properties of predicates and their arguments. We use this framework to construct a dataset covering the entirety of the Universal Dependencies English Web Treebank. We use this dataset to probe the efficacy of type-level and token-level information—including hand-engineered features and static (GloVe) and contextual (ELMo) word embeddings—for predicting expressions of generalization.
Highlights
Natural language allows us to convey information about particular individuals and events, as in (1), and generalizations about those individuals and events, as in (2).(1) a
Taking inspiration from decompositional semantics (Reisinger et al, 2015; White et al, 2016), we suggest that linguistic expressions of generalization should be captured in a continuous multilabel system, rather than a multi-class system
The ACE-2005 Multilingual Training Corpus (Walker et al, 2006) extends these annotation guidelines, providing two additional classes: (i) negatively quantified entries (NEG) for referring to empty sets and (ii) underspecified entries (USP), where the referent is ambiguous between GENERIC and SPECIFIC
Summary
Natural language allows us to convey information about particular individuals and events, as in (1), and generalizations about those individuals and events, as in (2). One obstacle to further progress on generalization is that current frameworks tend to take standard descriptive categories as sharp classes— e.g. EPISODIC, GENERIC, HABITUAL for statements and KIND, INDIVIDUAL for noun phrases This may seem reasonable for sentences like (1a), where Mary clearly refers to a particular individual, or (3a), where Bishops clearly refers to a kind; but natural text is less forgiving (Grimm, 2014, 2016, 2018). Taking inspiration from decompositional semantics (Reisinger et al, 2015; White et al, 2016), we suggest that linguistic expressions of generalization should be captured in a continuous multilabel system, rather than a multi-class system We do this by decomposing categories such as EPISODIC, HABITUAL, and GENERIC into simple referential properties of predicates and their arguments. We find that (i) referential properties of arguments are easier to predict than those of predicates; and that (ii) contextual learned representations contain most of the relevant information for both arguments and predicates (§9)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Transactions of the Association for Computational Linguistics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.