Abstract

AbstractThe present study is part of a larger research project developing computational tools for large-scale corpus-based semantic analyses. One such tool represents semantic structure with vector space models (VSMs). The paper shows that this tool and the models built require a deeper understanding, especially with a view to how its results relate to cognitive theories of meaning. Although token-based VSMs are increasingly used in corpus-based cognitive semantics, we believe it is insufficiently appreciated how alternative parameter settings deal with a range of semantic issues, such as granularity of meaning, prototypicality of the domain of application and interaction with syntactic patterns. For the purpose of this paper, we will focus on only one of those issues, viz. the prototypicality of the domain of application, presenting the results of three of our case studies on the Dutch adjectivesheilzaam,hoekig,hachelijkandgeldig. The models presented are built from a 520MW corpus of contemporary Dutch and Flemish newspapers and by varying parameters such as window size, part-of-speech and frequency thresholds in the selection of features. The resulting VSMs are evaluated through visual analytics: although multidimensional, they can be reduced to 2D and represented in scatterplots where more similar tokens appear closer to each other. The color-coding with manual sense tags employed here makes it possible to compare the groupings provided by human annotators with those of the computational models in a way consistent with the cognitive approach to meaning and categorization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.