A language independent machine learning model for sentiment analysis and toxicity classification in code-mixed data
A language independent machine learning model for sentiment analysis and toxicity classification in code-mixed data
- Research Article
26
- 10.1016/j.csl.2022.101407
- May 28, 2022
- Computer Speech & Language
An analysis of machine learning models for sentiment analysis of Tamil code-mixed data
- Research Article
47
- 10.1038/s41598-022-26092-3
- Dec 13, 2022
- Scientific Reports
Sentiment analysis is a process in Natural Language Processing that involves detecting and classifying emotions in texts. The emotion is focused on a specific thing, an object, an incident, or an individual. Although some tasks are concerned with detecting the existence of emotion in text, others are concerned with finding the polarities of the text, which is classified as positive, negative, or neutral. The task of determining whether a comment contains inappropriate text that affects either individual or group is called offensive language identification. The existing research has concentrated more on sentiment analysis and offensive language identification in a monolingual data set than code-mixed data. Code-mixed data is framed by combining words and phrases from two or more distinct languages in a single text. It is quite challenging to identify emotion or offensive terms in the comments since noise exists in code-mixed data. The majority of advancements in hostile language detection and sentiment analysis are made on monolingual data for languages with high resource requirements. The proposed system attempts to perform both sentiment analysis and offensive language identification for low resource code-mixed data in Tamil and English using machine learning, deep learning and pre-trained models like BERT, RoBERTa and adapter-BERT. The dataset utilized for this research work is taken from a shared task on Multi task learning DravidianLangTech@ACL2022. Another challenge addressed by this work is the extraction of semantically meaningful information from code-mixed data using word embedding. The result represents an adapter-BERT model gives a better accuracy of 65% for sentiment analysis and 79% for offensive language identification when compared with other trained models.
- Research Article
10
- 10.1016/j.irbm.2020.07.004
- Jul 20, 2020
- IRBM
Artificial Immune Systems-Based Classification Model for Code-Mixed Social Media Data
- Research Article
73
- 10.1177/1063293x211031485
- Jul 20, 2021
- Concurrent Engineering
The modern society runs over the social media for their most time of every day. The web users spend their most time in social media and they share many details with their friends. Such information obtained from their chat has been used in several applications. The sentiment analysis is the one which has been applied with Twitter data set toward identifying the emotion of any user and based on those different problems can be solved. Primarily, the data as of the Twitter database is preprocessed. In this step, tokenization, stemming, stop word removal, and number removal are done. The proposed automated learning with CA-SVM based sentiment analysis model reads the Twitter data set. After that they have been processed to extract the features which yield set of terms. Using the terms, the tweets are clustered using TGS-K means clustering which measures Euclidean distance according to different features like semantic sentiment score (SSS), gazetteer and symbolic sentiment support (GSSS), and topical sentiment score (TSS). Further, the method classifies the tweets according to support vector machine (CA-SVM) which classifies the tweet according to the support value which is measured based on the above two measures. The attained results are validated utilizing k-fold cross-validation methodology. Then, the classification is performed by utilizing the Balanced CA-SVM (Deep Learning Modified Neural Network). The results are evaluated and compared with the existing works. The Proposed model achieved 92.48 % accuracy and 92.05% sentiment score contrasted with the existing works.
- Research Article
9
- 10.14569/ijacsa.2021.0120430
- Jan 1, 2021
- International Journal of Advanced Computer Science and Applications
Social media has rapidly expanded over a period of time and generated a huge repository of content. Sentiment analysis of this data has a vast scope in decision support and attracted many researchers to explore various possibilities for technique enhancement and accuracy improvement. Twitter is one of the social media platforms that are widely explored in the area of sentiment analysis. This paper presents a systematic survey related to Social Networking Sites Sentiment Analysis and mainly focus on Twitter sentiment analysis. The paper explores and identifies the techniques and tools used in a well-structured approach to find out the research gaps and identify future scope in this area of research. The techniques evolved over time to improve the efficiency of classification. Total 55 research papers are included in this survey. The result reflects that Twitter is the most explored social networking site for opinion mining. Naïve Bayes and SVM machine learning algorithms are implemented in maximum researches. As the latest advancements, Stack based ensemble, fuzzy based and neural network based classifiers are also implemented to enhance the efficiency of classification. WEKA, R Studio, Python are mostly used tools by research scholars for implementation. The overall evolution of the research goes through various changes in terms of technologies, tools, social media platforms and data corpus targeted.
- Research Article
- 10.37680/qalamuna.v16i2.5890
- Dec 14, 2024
- QALAMUNA: Jurnal Pendidikan, Sosial, dan Agama
Traditional curriculum in Indonesia often need more flexibility in accommodating students' interests and talents, resulting in limited opportunities for students to explore their skills and creativity. SMA Plus Muthahhari Bandung implements the X-Day Curriculum to create an independent and interest-based learning model, where students can choose areas of interest such as art, language, or sports. This study aims to analyze the implementation of the X-Day Curriculum as a curriculum model that supports independent learning and the development of students' interests and talents. Data was collected through a qualitative approach through in-depth interviews, observation, and documentation analysis. The research was conducted through several technical stages, including data collection, validation, analysis, and interpretation of results. The results showed that the X-Day Curriculum gives students the freedom to choose areas of interest consisting of arts, languages, and sports, facilitating the development of independence, creativity, and life skills relevant to the needs of the 21st century. The program also encourages motivation to learn with a more flexible approach to fun learning, including annual activities such as AKBARI, which appreciates students' work. The study concluded that the X-Day Curriculum effectively shapes students who are independent, innovative, and ready to face future educational challenges.
- Book Chapter
10
- 10.1007/978-981-16-9113-3_54
- Jan 1, 2022
Sentiment analysis is the task of identifying and classifying sentiments expressed in texts. Sentiment analysis of code-mixed data is a huge challenge for the NLP community since it is very different from the traditional structures of standard languages. Code mixing refers to additions of linguistic units like phrases or words of one language to another. The mixing of languages takes place not only on sentence level but also at the word level. It is important to perform sentiment analysis on such code-mixed data for better understanding of the text and for further classification. We have implemented various basic machine learning algorithms, viz. decision tree, linear SVC, logistic regression, multinomial naive Bayes, and SGD classifier for performing sentiment analysis on code-mixed Hinglish dataset. To address the issues of phonetic typing and multilingual words, we have proposed an ensemble-based classifier to identify the sentiment expressed in code-mixed Hinglish tweets. Based on the extensive experimental analysis, we observed that XGBoost performed well in comparison with other machine learning algorithms. With the XGBoost ensemble learning algorithm, we obtained an F1-score of 83.10%, which is significantly better than the existing state-of-the-art works on the Hinglish dataset.KeywordsCode-mixed textSentiment analysisXGBoostMachine learningHinglish tweets
- Research Article
- 10.1088/1742-6596/1188/1/012023
- Mar 1, 2019
- Journal of Physics: Conference Series
Mathematics is a field of study that are useful in solving various problems in everyday life that requires a skill and the ability to solve those problems. In this research, the author examines the constraints faced by the students on the material SPLDV. In this research, the author examines the constraints faced by the students on the material SPLDV. The author tried to examine how to resolve the problem of the students by applying a model of learning. This research aims to find out whether there are significant effects between independent learning model of problem-solving ability against Shiva on the material SPLDV of Class VII MTsN Batang Angkola. This research is quantitative research with experimental methods. The population of this research is the overall grade VIII MTsN Batang Angkola consisting of 5 classes as much as 175 people, samples taken from the class VIII4 and VIII2. With the number of 35 samples of students in the class VIII4 as class experiments and 35 students in control classrooms as VIII2. Data-collecting instruments is a test that is given twice, before being given treatment (pretest) and after (posttest) the treatment given. Data processing and data analysis was done using the formula t-test. Based on the results of a test of the hypothesis, retrieved thitung = 2.48 ttabel > 1.6675 0.05 significant level. Based on those results, the conclusion to be drawn there are significant effects between independent learning model against Shiva’s problem solving on SPLDV material class VIII MTsN Batang Angkola.
- Conference Article
7
- 10.1109/pvsc40753.2019.8980474
- Jun 1, 2019
Photovoltaic (PV) power prediction is important for monitoring the performance of PV plants. The scope of this work is to develop a methodology for deriving an optimized location and technology independent machine learning (ML) model for power prediction. The prediction accuracy results demonstrated that the performance of the ML model was primarily affected by the dataset split method. In particular, for a 70:30 % train and test set approach, the ML model achieved a normalized root mean square error (nRMSE) of 0.88 % when using randomly selected samples compared to 0.94 % when using continuous samples. The accuracy of the developed model was also affected by the duration of the train set. For a random 70:30 % train and test set approach, the constructed ML topology achieved a nRMSE of 0.88 %, while when the dataset was split into a 30:30 % portion, the nRMSE was 0.95 %. Moreover, when low irradiance conditions were filtered out and 70 % of the entire dataset was randomly chosen for model training, a nRMSE of 1.41 % was obtained demonstrating that the model’s accuracy was not improved. Finally, for a random 10:30 % train and test set approach, the FNNN achieved the lowest nRMSE of 1.10 % when the model was trained using the prevailing irradiance classes.
- Research Article
- 10.26858/jp.v4i3.67947
- Oct 24, 2024
- Panrita: Jurnal Bahasa dan Sastra Daerah serta Pembelajarannya
This study aims to describe the ability of speech before using independent learning model, the ability after using independent learning model, and the effect of using independent learning model on students' speech ability. The method used in this research is quantitative method, the research design used is One Group Pretest-Posttest Design. The population in this study was Class IX students of SMP Negeri 15 Bulukumba, totaling 37 students. The sampling used was using purposive sampling technique. The sample in this study amounted to 19 students. The data collection technique in this study was speech task. The collected data were analyzed with descriptive statistics and inferential statistics. The results of data analysis show that the use of independent learning models can affect students' speech skills. The results of hypothesis testing showed that the sig value. 0.038 <0.05 so that the alternative hypothesis (Ha) is accepted, namely there is an effect of the use of independent learning models on students' speech skills.
- Research Article
- 10.25078/gw.v8i2.2151
- Sep 29, 2021
- GUNA WIDYA: JURNAL PENDIDIKAN HINDU
<p>The study aimed to determine the extent of the productivity and students’ learning outcomes increased during the COVID-19 pandemic through the independent learning implementation. The implementation of this independent learning model is based on the existence of appeals and government policies that require and oblige lecturers and students to carry out the learning process at their respective homes. The main subjects of this study were all students in the first semester of Class A of the Hindu Religious Education Study Program, Faculty of Dharma Acarya, Universitas Hindu Negeri I Gusti Bagus Sugriwa Denpasar with a total of 28 people. The results showed that 98% of students experienced an increase in learning productivity. It was supported by an increase in learning outcomes with a percentage of 96%. This increase was evidenced by comparing the middle test score with the final test score. The increase in student learning outcomes cannot be separated from parents’ support, supervision of lecturers, and online learning programs using zoom. The YouTube-assisted learning program is also a way to support the independent learning implementation, besides learning books obtained online. This independent learning model is considered productive during the COVID-19 pandemic to support government policy in conduct social distancing and increasing student learning productivity.</p>
- Research Article
- 10.30605/onoma.v11i1.5390
- Feb 5, 2025
- Jurnal Onoma: Pendidikan, Bahasa, dan Sastra
This study describes the concept of an independent learning model as a demand for 21st century learning that is aligned with local wisdom content. Through the concept of glocalization in learning, it will be able to increase the cognitive potential of students more optimally. This study aims to help improve the quality of learning to read and write for elementary school students in grade 1 in remote coastal areas of Southwest Papua. This study uses a descriptive qualitative method to present in a structured manner the concept of an independent learning model with local content in learning Indonesian (beginning reading and writing). Data collection techniques and instruments in this study use observation, interview, and documentation techniques. The results of this study are 1) a description of the syntax concept of an independent learning model with local content; 2) a description of the stages of glocalization of Indonesian language learning; and 3) a description of the value of student learning outcomes. The conclusion of this study is that through glocalization of Indonesian language learning in elementary school children, it will be able to improve the ability to read and write very effectively through five syntax models of independent learning with local wisdom content.
- Research Article
4
- 10.1109/access.2022.3228263
- Jan 1, 2022
- IEEE Access
Social media contains a plethora of information in the form of text, images, videos, and other data. Users across the globe are increasingly sharing their data on various social media platforms. Sentiment analysis of data, such as text, images, and videos are widely used to understand the feelings of users. In recent years, the convolutional neural network (CNN) has been extensively applied for various applications. The cloud computing environment is a popular service due to its reliability, availability, and easy software integration. However, CNN models are deep neural networks that have a high computational cost. There is a need for CNN models which utilize lesser computational resources especially when these models are deployed in a cloud environment due to the remote physicality of servers, resource optimization, and infrastructure cost reduction. In this research, Gabor filters are integrated with CNN models to improve image sentiment analysis in a cloud environment, with advantages such as the reduction in computation energy and time, the elimination of the need for pre-trained models, and a perceived accuracy improvement. Two variants of Gabor-CNN (G-CNN) models with a different number of pooling and normalization layers are developed. The proposed G-CNN is trained and tested using <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">five</i> standard databases as SentiBank, Twitter, MVSO, MultiView_I, and MultiView_II. Maximum classification accuracies of 91.71%, 92.52%, 97.39%, 90.88%, and 91.31% are obtained on SentiBank, Twitter, MVSO, MultiView_I, and MultiView_II databases respectively using the developed models. The proposed G-CNN model has provided an accuracy of 92.76% on average.
- Research Article
6
- 10.13053/cys-24-4-3151
- Dec 9, 2020
- Computación y Sistemas
The paper describes the application of the code mixed index in Indian social media texts and comparing the complexity to identify language at word level using BLSTM neural model. In Natural Language Processing one of the imperative and relatively less mature areas is a transliteration. During transliteration, issues like language identification, script specification, missing sounds arise in code mixed data. Social media platforms are now widely used by people to express their opinion or interest. The language used by the users in social media nowadays is Code-mixed text, i.e., mixing of two or more languages. In code-mixed data, one language will be written using another language script. So to process such code-mixed text, identification of language used in each word is important for language processing. The major contribution of the work is to propose a technique for identifying the language of Hindi-English code-mixed data used in three social media platforms namely, Facebook, Twitter, and WhatsApp. We propose a deep learning framework based on cBoW and Skip gram model for language identification in code mixed data. Popular word embedding features were used for the representation of each word. Many researches have been recently done in the field of language identification, but word level language identification in the transliterated environment is a current research issue in code mixed data. We have implemented a deep learning model based on BLSTM that predicts the origin of the word from language perspective in the sequence based on the specific words that have come before it in the sequence. The multichannel neural networks combining CNN and BLSTM for word level language identification of code-mixed data where English and Hindi roman transliteration has been used. Combining this with a cBoW and Skip gram for evaluation. The proposed system BLSTM context capture module gives better accuracy for word embedding model as compared to character embedding evaluated on our two testing sets. The problem is modeled collectively with the deep-learning design. We tend to gift an in-depth empirical analysis of the proposed methodology against standard approaches for language identification.
- Research Article
- 10.26618/exposure.v12i2.12635
- Nov 30, 2023
- EXPOSURE : JURNAL PENDIDIKAN BAHASA INGGRIS
Multi-blended learning model was combination of four learning models. They were independent learning model, face to face learning model, small group learning model, and online learning model. This research employed descriptive qualitative method. The research sample consisted of one teacher and 36 students at SMA Negeri 2 Buru, Maluku. The collected data employed observation and interview, and then the data is presented descriptively. The result study was used to explain the challenges faced by teacher and students’ perception in implementation of Multi-Blended Learning Model in English teaching at the remote area school. The challenges faced in independent learning model were low interest and motivation to learn independently, low control and sufficient attention from parents, low support from parents, and low economic level of students’ parents. The second challenges faced in face to face classroom learning model were limited teaching time, large number of students in one class, and different English ability in one class. Several obstacles found in small group learning model were limited teaching time, and large number of students in one class. The obstacles found in online learning model was that some students did not have internet data package, so they could not update information and new materials optimally. Students’ perception that multi-blended learning model was very helpful for students to collaborate and interact among students and teacher. Students became enthusiastic to learn independently and understanding of teacher’s material well through video tutorial available on the teacher’s YouTube content.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.