Abstract
Abstract. The presented research considers the problems of studying the cluster structure of multidimensional data volumes. This paper presents the results of numerical experiments on the study of data volumes consisting of frequencies of joint use of words from different parts of speech, for instance “noun + verb” or “adjective + noun”. The volumes of data are obtained from samples from text collections in Russian. The aim of the research is to analyze the cluster structure of the studied volume and semantic proximity of words in clusters and subclusters. The hypothesis was used that words with similar meaning should occur in approximately the same context. In this regard, in the space of features, they will be at a relatively close distance from each other, while differing words will be at a more distant distance from each other. Research is carried out using elastic maps, which are effective tools for visual analysis of multidimensional data. The construction of elastic maps and their extensions in the space of the first three principal components makes it possible to determine the cluster structure of the studied multidimensional data volumes. Such analysis can be useful in the tasks of confronting negative verbal influences such as fake news, hidden propaganda, involvement in sects, verbal manipulation, etc. Also this approach can be applied to text collections having medical origin.
Highlights
The tasks of analyzing multidimensional data are currently one of the main directions in Computer Science, computational mathematics, mathematical modeling, computer engineering
The main attention is paid to the study of the possibility of applying the methods of elastic maps for the analysis of thematic proximity of Russian words
We studied a transposed data set, where nouns played the role of measurements, and adjectives were considered as points in a multidimensional data set
Summary
The tasks of analyzing multidimensional data are currently one of the main directions in Computer Science, computational mathematics, mathematical modeling, computer engineering. An analytical study of data, their generalization and identification of key dependencies allows us to see the meaning in their very existence. The need to process, visualize and analyze multidimensional [data has led to the intensive development of visual analytics tools (Wong, Thomas, 2004), (Thomas, Cook, 2005), (Kielman, Thomas, 2009), (Keim et al, 2010). The approaches and methods of visual analytics are constantly evolving and provide users with sufficiently reliable tools for solving many practical problems of multidimensional data exploration. These tasks include the tasks of data classification, cluster detection, identification of key defining parameters, establishing relationships between key parameters, etc
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.