The COVID-19 pandemic impacted the mood of the people, and this was evident on social networks. These common user publications are a source of information to measure the population's opinion on social phenomena. In particular, the Twitter network represents a resource of great value due to the amount of information, the geographical distribution of the publications and the openness to dispose of them. This work presents a study on the feelings of the population in Mexico during one of the waves that produced the most contagion and deaths in this country. A mixed, semi-supervised approach was used, with a lexical-based data labeling technique to later bring these data to a pre-trained model of Transformers completely in Spanish. Two Spanish-language models were trained by adding to the Transformers neural network the adjustment for the sentiment analysis task specifically on COVID-19. In addition, ten other multilanguage Transformer models including the Spanish language were trained with the same data set and parameters to compare their performance. In addition, other classifiers with the same data set were used for training and testing, such as Support Vector Machines, Naive Bayes, Logistic Regression, and Decision Trees. These performances were compared with the exclusive model in Spanish based on Transformers, which had higher precision. Finally, this model was used, developed exclusively based on the Spanish language, with new data, to measure the sentiment about COVID-19 of the Twitter community in Mexico.
Read full abstract