Abstract

This paper addresses the contribution of quantitative analysis and statistical techniques to qualitative semantic analysis, as it discusses the methodological issues for clustering and plotting the most significant first-order co-occurrences of a word as a way to explore its degree of semantic heterogeneity in a technical corpus. Since distributional (dis)similarity reflects semantic (dis)similarity, first-order co-occurrences are clustered with respect to the second and/or third-order co-occurrences they have in common. In this comparative and exploratory study, several experiments are carried out in order to evaluate the impact of various parameters for clustering and in order to find the most reliable configuration of parameters, including association measures, distance measures and lower and upper thresholds. Multidimensional scaling techniques and the visual exploration of semantic proximity between first-order co-occurrences of a node allow us to gain insight into the phenomena of semantic homogeneity and heterogeneity in a technical corpus. As a consequence, we can come to a better understanding of the semantic characteristics of specialized language. However, the methodology for understanding this area is still being implemented and worked out. With the experiments described in this paper, we are contributing to the ongoing methodological analysis of measures and parameters to be used in the field of distributional semantics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.