Human Associations and the Choice of Features for Semantic Verb Classification

Sabine Schulte Im Walde

doi:10.1007/s11168-008-9044-8

Abstract

This article investigates whether human associations to verbs as collected in a web experiment can help us to identify salient features for semantic verb classes. Starting from the assumption that the associations, i.e., the words that are called to mind by the stimulus verbs, reflect highly salient linguistic and conceptual features of the verbs, we apply a cluster analysis to the verbs, based on the associations, and validate the resulting verb classes against standard approaches to semantic verb classes. Then, we perform various clusterings on the same verbs using standard corpus-based feature types, and evaluate them against the association-based clustering as well as GermaNet and FrameNet classes. Comparing the cluster analyses provides an insight into the usefulness of standard feature types in verb clustering, and assesses shallow vs. deep syntactic features, and the role of corpus frequency. We show that (a) there is no significant preference for using a specific syntactic relationship (such as direct objects) as nominal features in clustering; (b) that simple window co-occurrence features are not significantly worse (and in some cases even better) than selected grammar-based functions; and (c) that a restricted feature choice disregarding high- and low-frequency features is sufficient. Finally, by applying the feature choices to GermaNet and FrameNet verbs and classes, we address the question of whether the same types of features are salient for different types of semantic verb classes. The variation of the gold standard classifications demonstrates that the clustering results are significantly different, even when relying on the same features.

Full Text