In this paper, we described and tested several ways to use machine learning in order to analyze large collections of text data from social networks (namely, public Telegram chat), retrieve relevant social or cultural information from them, and to visualize the results of the research. The proposed approach has an advantage to reveal hidden patterns of social, political or cultural behavior by being able to cover large amounts of data. It can complement the standard social surveys methodology. Automatic detecting cultural bias on the example of social media requires mastering methods for measuring and visualizing its different kinds, such as cultural shifts, specific national or group refractions, mutations, stereotypes. We argue that cultural bias is a result of nonrandom errors in thinking. It is based, firstly, on a person's understanding of himself and the world around him and, secondly, on the translation of this understanding into abstraction in the form of common misconceptions, ideologemes, narrative, slogans. In society the bias inevitably leads to the separation of one social group or subculture from another. Social networks (both classic and new formats, for example, messengers with public chat options) are the most active ground for the representation of this phenomenon. Since the discussion of sociopolitical and cultural contexts in the case of chats takes place in public, the participants of such a communicative act tend to get approval of the social group to which they are ideologically close. It is this phenomenon that allows us to form comparisons of the “friend - foe” type, which lead next to unconscious cultural shifts. Thus, mastering methods to identify properly cultural shifts is not only relevant but crucial for the intra- and intercultural communication, for controlling the level of aggressiveness of the society, understanding its mood. As helpful illustrations, readers will find semantic associations elicited by the words “freedom”, “democracy”, “Internet”; sociocultural analysis of several topical clusters (e.g. Россия, страна, Путин, русский, православный); visualization of semantic associations for the words “freedom”, “democracy”, “Internet”.
Read full abstract