Abstract

Our hypothesis is that the size (how many) and constitution (what verbs) of these groups can be used to derive the semantic features that characterize both individual lexical items and the domain as a whole. We investigated whether and how it is possible to discover such relations and patterns for the set of motion related verbs, based on verb clusters provided by the human subjects. The paper presents a computational method that aims to discover the most salient features and their degree of saliency. The approach adopted in this paper resembles vector-based semantic space models which rely on patterns of word co-occurrence to derive similarity estimates ([7], [8]). The difference from such approaches is that they aim to extract information either from the broader lexical or from the syntactic context of the target word, while our approach targets groupings based on closer semantic similarity within a well-defined conceptual and semantic domain (e.g., words describing human locomotion). In our formalisation, both the columns and the rows in the raw matrix are target words, i.e. it is a verb-verb matrix. Even though this approach might appear narrow and highly restricted to the domain it applies to, it is justified on the basis of research and intuitions in lexical semantics, as well as human categorization. Thus, studying the grouping of words that are partially synonymous with each other and can be subsumed under the same superordinate term, can be used to reveal the underlying features that characterize this semantic field and the basic (superordinate) term. Moreover, Semantic space models have been criticized exactly on the grounds of not being able to address the nature of the semantic relationship that underlies proximity of words in the semantic space [7]. We address this shortcoming by using a feature-verb matrix to estimate the weighting of features. Another difference between the current approach and existing approaches in cognitive science and psychology is that, while the latter have used human elicitation to verify the findings from semantic space models [9], we adopt a parallel experimental strategy: we seek to find out the extent to which a computational model based on human data can improve by using featural data elicited from the human data. The outline of the paper is the following. We first introduce the human sorting task experiment and its linguistic background in the next section. We then proceed, in section III with the computational method for computing feature weights and the clusters based on various combinations of the features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call