Due to the rise in digital activity of students as well as increased social media presence, the lack of regulation of platforms has given rise to another form of bullying, popularly known as cyberbullying. Cyberbullying is one of the most adverse issues prevalent in schools nationwide. Cyberbullying refers to bullying that happens over any web-interfaced or electronic platform. It is an activity that significantly affects the mental and physical health of its victims. With increased secrecy, the frequency and propagation of cyberbullying remain high due to the information technology infrastructure available today. Understanding cyberbullying trends and preventing them, using suitable machine learning algorithms, could help numerous school students lead better lives, as well as make better decisions, which help them grow and flourish into capable future leaders. Hence, the authors' aim for this research paper is to focus on adolescent girls using various tools and techniques like text analytics and image analytics. For this paper, the authors study a sample of netizens. The location where the analysis is conducted is New Delhi, and the real-world data is extracted from Twitter in English. The real-world data is extracted using appropriate data mining algorithms to find hidden patterns and then conduct the analyses required to understand the psychology of girls and boys and the tonality and voice of the tweets/posts. This is done from the open-source information available on the platform (Twitter) from tweets by the users. There is little to no bias as the entire process can be automated; hence, tweets will be filtered or flagged based on data. Such a method allows one to get access to unbiased data. Bias, in this case, can be defined as prejudice in action and response received from a participant. The results are then analysed using polarity and subjectivity. Understanding psychology and personality traits helps in drawing insights from the expressions collected. The authors will be studying the sample bios, likes, and comments of the sample using a lexical and syntactical approach. Six thousand top tweets are extracted, and the 15 tweets which score the highest on polarity and subjectivity values are taken for further analysis. The tweets are filtered based on 16 responses from a focus group filtering the 20 most popular profane words. Since the data is extracted using Twitter (i.e., a secondary data source), the authors address the gap in current psychological analyses. In such studies, one usually circulates questionnaires to understand the participant, but, for this research though, the authors will be studying the data without bringing the concerned individual into play, thereby eliminating the human bias, which is a significant limitation of gathering responses through a questionnaire. There is increased scope for further streamlining the model. The inferences include understanding the regulation of a social media platform, the degree of aggression on the platform, and an effort to distinguish those who cause such aggression.
Read full abstract