Transformer and Pre-Transformer Model-Based Sentiment Prediction with Various Embeddings: A Case Study on Amazon Reviews
Sentiment analysis is essential for understanding consumer opinions, yet selecting the optimal models and embedding methods remains challenging, especially when handling ambiguous expressions, slang, or mismatched sentiment–rating pairs. This study provides a comprehensive comparative evaluation of sentiment classification models across three paradigms: traditional machine learning, pre-transformer deep learning, and transformer-based models. Using the Amazon Magazine Subscriptions 2023 dataset, we evaluate a range of embedding techniques, including static embeddings (GloVe, FastText) and contextual transformer embeddings (BERT, DistilBERT, etc.). To capture predictive confidence and model uncertainty, we include categorical cross-entropy as a key evaluation metric alongside accuracy, precision, recall, and F1-score. In addition to detailed quantitative comparisons, we conduct a systematic qualitative analysis of misclassified samples to reveal model-specific patterns of uncertainty. Our findings show that FastText consistently outperforms GloVe in both traditional and LSTM-based models, particularly in recall, due to its subword-level semantic richness. Transformer-based models demonstrate superior contextual understanding and achieve the highest accuracy (92%) and lowest cross-entropy loss (0.25) with DistilBERT, indicating well-calibrated predictions. To validate the generalisability of our results, we replicated our experiments on the Amazon Gift Card Reviews dataset, where similar trends were observed. We also adopt a resource-aware approach by reducing the dataset size from 25 K to 20 K to reflect real-world hardware constraints. This study contributes to both sentiment analysis and sustainable AI by offering a scalable, entropy-aware evaluation framework that supports informed, context-sensitive model selection for practical applications.
- Conference Article
7
- 10.1109/inista55318.2022.9894051
- Aug 8, 2022
Natural Language Processing (NLP) is an interdisciplinary field between linguistics and computer science. Its main aim is to process natural (human) language using computer programs. Text classification is one of the main tasks of this field, and they are widely used in many different applications such as spam filtering, sentiment analysis, and document categorization. Nonetheless, there is only very little text classification work in the law domain and even less for the Turkish language. This may be attributed to the complexity within the domain. The length, complexity of documents, and use of extensive technical jargon are some of the reasons that distinguish this domain from others. Similar to the medical domain, understanding these documents requires extensive specialization. Another reason can be the scarcity of publicly available datasets. In this study, we compile sizeable unsupervised and supervised datasets from publicly available sources and experiment with several classification algorithms ranging from traditional classifiers to much more complicated deep learning and transformer-based models along with different text representations. We focus on classifying Court of Cassation decisions for their crime labels. Interestingly, the majority of the models we experiment with could be able to obtain good results. This suggests that although understanding the documents in the legal domain is complicated and requires expertise from humans, it may be relatively easier for machine learning models despite the extensive presence of the technical terms. This seems to be especially the case for transformer-based pre-trained neural language models which can be adapted to the law domain, showing high potential for future real-world applications.
- News Article
1
- 10.1016/s1351-4180(13)70383-0
- Sep 25, 2013
- Focus on Catalysts
Base oils made from light olefins using an ionic liquid catalyst
- Research Article
47
- 10.1016/j.combustflame.2016.07.004
- Jul 25, 2016
- Combustion and Flame
Chemical kinetic model uncertainty minimization through laminar flame speed measurements
- Research Article
41
- 10.1016/j.jhydrol.2023.129684
- May 18, 2023
- Journal of Hydrology
Runoff predictions in new-gauged basins using two transformer-based models
- Research Article
15
- 10.1029/2022wr033939
- Mar 1, 2023
- Water Resources Research
Estimating uncertainty in flood model predictions is important for many applications, including risk assessment and flood forecasting. We focus on uncertainty in physics‐based urban flooding models. We consider the effects of the model's complexity and uncertainty in key input parameters. The effect of rainfall intensity on the uncertainty in water depth predictions is also studied. As a test study, we choose the Interconnected Channel and Pond Routing (ICPR) model of a part of the city of Minneapolis. The uncertainty in the ICPR model's predictions of the floodwater depth is quantified in terms of the ensemble variance using the multilevel Monte Carlo (MC) simulation method. Our results show that uncertainties in the studied domain are highly localized. Model simplifications, such as disregarding the groundwater flow, lead to overly confident predictions, that is, predictions that are both less accurate and uncertain than those of the more complex model. We find that for the same number of uncertain parameters, increasing the model resolution reduces uncertainty in the model predictions (and increases the MC method's computational cost). We employ the multilevel MC method to reduce the cost of estimating uncertainty in a high‐resolution ICPR model. Finally, we use the ensemble estimates of the mean and covariance of the flood depth for real‐time flood depth forecasting using the physics‐informed Gaussian process regression method. We show that even with few measurements, the proposed framework results in a more accurate forecast than that provided by the mean prediction of the ICPR model.
- Research Article
2
- 10.1016/j.ecolmodel.2022.110233
- Dec 7, 2022
- Ecological Modelling
Effect of variation in the observations on the prediction uncertainty in crop model simulation: Use ORYZA (v3) as a case study
- Research Article
1
- 10.34185/1562-9945-5-154-2024-13
- Oct 3, 2024
- System technologies
Recent advancements in text classification have focused on the application of machine learn-ing and deep learning techniques. Traditional methods such as Naive Bayes, Logistic Regression, and Support Vector Machines (SVM) have been widely utilized due to their efficiency and simplic-ity. However, the advent of deep learning has introduced more complex models like Artificial Neu-ral Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), which can automatically extract features and detect intricate patterns in textual data. Addi-tionally, transformer-based models such as BERT have set new benchmarks in text classification tasks. Despite their high accuracy, these models require substantial computational resources and are not always practical for every application. The ongoing research aims to balance accuracy and computational efficiency. Purpose of Research. The primary objective of this study is to review and compare various methods for automated text classification based on sentiment analysis. This research aims to evalu-ate the prediction accuracy of different models, including traditional machine learning algorithms and modern deep learning approaches, and to provide insights into their practical applications and limitations. Presentation of the Main Research Material. This study utilizes the “IMDB Dataset of 50K Movie Reviews” to train and test various text classification models. The dataset comprises movie reviews and their associated sentiment labels, either positive or negative. The research employs several preprocessing steps. For feature extraction, methods such as Bag-of-Words (BoW), TF-IDF (Term Frequency-Inverse Document Frequency), and Word2Vec are used. These features are then fed into various classifiers: Naive Bayes, Support Vector Machines (SVM), Logistic Regression, Deep Learning Models. Conclusions. The comparative analysis reveals that while traditional machine learning meth-ods like Naive Bayes, SVM and Logistic Regression are efficient and easy to implement, deep learn-ing models offer superior accuracy by capturing more complex patterns in the data. However, the computational demands of deep learning models, particularly transformers, limit their applicability in resource-constrained environments. Future research should focus on optimizing these models to balance accuracy and computational efficiency, making advanced text classification accessible for a broader range of applications. Recent advancements in text classification have focused on the application of machine learn-ing and deep learning techniques. Traditional methods such as Naive Bayes, Logistic Regression, and Support Vector Machines (SVM) have been widely utilized due to their efficiency and simplic-ity. However, the advent of deep learning has introduced more complex models like Artificial Neu-ral Networks (ANN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), which can automatically extract features and detect intricate patterns in textual data. Addi-tionally, transformer-based models such as BERT have set new benchmarks in text classification tasks. Despite their high accuracy, these models require substantial computational resources and are not always practical for every application. The ongoing research aims to balance accuracy and computational efficiency.
- Research Article
1
- 10.7717/peerj-cs.2644
- Mar 24, 2025
- PeerJ. Computer science
Social media platforms such as X, Facebook, and Instagram have become essential avenues for individuals to articulate their opinions, especially during global emergencies. These platforms offer valuable insights that necessitate analysis for informed decision-making and a deeper understanding of societal trends. Sentiment analysis is crucial for assessing public sentiment toward specific issues; however, applying it to dialectal Arabic presents considerable challenges in natural language processing. The complexity arises from the language's intricate semantic and morphological structures, along with the existence of multiple dialects. This form of analysis, also referred to as sentiment classification, opinion mining, emotion mining, and review mining, is the focus of this study, which analyzes tweets from three benchmark datasets: the Arabic Sentiment Tweets Dataset (ASTD), the A Twitter-based Benchmark Arabic Sentiment Analysis Dataset (ASAD), and the Tweets Emoji Arabic Dataset (TEAD). The research involves experimentation with a variety of comparative models, including machine learning, deep learning, transformer-based models, and a transformer-based ensemble model. Feature extraction for both machine learning and deep learning approaches is performed using techniques such as AraVec, FastText, AraBERT, and Term Frequency-Inverse Document Frequency (TF-IDF). The study compares machine learning models such as support vector machine (SVM), naïve Bayes (NB), decision tree (DT), and extreme gradient boosting (XGBoost) with deep learning models such as convolutional neural networks (CNN) and bidirectional long short-term memory (BLSTM) networks. Additionally, it explores transformer-based models such as CAMeLBERT, XLM-RoBERTa, and MARBERT, along with their ensemble configurations. The findings demonstrate that the proposed transformer-based ensemble model achieved superior performance, with average accuracy, recall, precision, and F1-score of 90.4%, 88%, 87.3%, and 87.7%, respectively.
- Conference Article
4
- 10.1109/sceecs54111.2022.9741028
- Feb 19, 2022
India has one of the largest user bases for Internet-based services. In 2021, India had 624 million internet users, the second largest in the world<sup>1</sup><sup>1</sup>https://datareportal.com/reports/digital-2021-india. With so many internet users, each generating lots of textual data, having tools to analyze the data can be very helpful to a wide variety of people including researchers, marketers, and product managers. Research regarding Sentiment Analysis in English is plentiful, but we need a different method to perform the same in Hindi, one of the most popular languages of the Indian subcontinent. In this paper, we use Transformer-based pre-trained models on Hindi Sentiment Analysis tasks. The sentiment analysis task is done on a dataset that has Hindi text and its corresponding sentiment as Positive, Negative, and Neutral. The Hindi text contains sentences collected from various sites. The sentences primarily contain product and movie reviews. The sentiment analysis is done using five different transformer-based models, out of which a few have been trained for multiple languages while the others have been fine-tuned specifically for the Hindi language. We also compare the performance of a few different multilingual models on sentiment analysis tasks. Out of all the models compared, we get the best accuracy of 82% from the Hindi Microsoft Multilingual-MiniLM-L12-H384 model.
- Research Article
- 10.2196/64723
- Oct 15, 2025
- JMIR Formative Research
BackgroundIn the digital age, social media has become a crucial platform for public discourse on diverse health-related topics, including vaccines. Efficient sentiment analysis and hesitancy detection are essential for understanding public opinions and concerns. Large language models (LLMs) offer advanced capabilities for processing complex linguistic patterns, potentially providing valuable insights into vaccine-related discourse.ObjectiveThis study aims to evaluate the performance of various LLMs in sentiment analysis and hesitancy detection related to vaccine discussions on social media and identify the most efficient, accurate, and cost-effective model for detecting vaccine-related public sentiment and hesitancy trends.MethodsWe used several LLMs—generative pretrained transformer (GPT-3.5), GPT-4, Claude-3 Sonnet, and Llama 2—to process and classify complex linguistic data related to human papillomavirus; measles, mumps, and rubella; and vaccines overall from X (formerly known as Twitter), Reddit, and YouTube. The models were tested across different learning paradigms: zero-shot, 1-shot, and few-shot to determine their adaptability and learning efficiency with varying amounts of training data. We evaluated the models’ performance using accuracy, F1-score, precision, and recall. In addition, we conducted a cost analysis focused on token usage to assess the computational efficiency of each approach.ResultsGPT-4 (F1-score=0.85 and accuracy=0.83) outperformed GPT-3.5, Llama 2, and Claude-3 Sonnet across various metrics, regardless of the sentiment type or learning paradigm. Few-shot learning did not significantly enhance performance compared with the zero-shot paradigm. Moreover, the increased computational costs and token usage associated with few-shot learning did not justify its application, given the marginal improvement in model performance. The analysis highlighted challenges in classifying neutral sentiments and convenience, correctly interpreting sarcasm, and accurately identifying indirect expressions of vaccine hesitancy, emphasizing the need for model refinement.ConclusionsGPT-4 emerged as the most accurate model, excelling in sentiment and hesitancy analysis. Performance differences between learning paradigms were minimal, making zero-shot learning preferable for its balance of accuracy and computational efficiency. However, the zero-shot GPT-4 model is not the most cost-effective compared with traditional machine learning. A hybrid approach, using LLMs for initial annotation and traditional models for training, could optimize cost and performance. Despite reliance on specific LLM versions and a limited focus on certain vaccine types and platforms, our findings underscore the capabilities and limitations of LLMs in vaccine sentiment and hesitancy analysis, highlighting the need for ongoing evaluation and adaptation in public health communication strategies.
- Research Article
4
- 10.1155/2019/5076438
- Feb 19, 2019
- Mathematical Problems in Engineering
The model uncertainty in prediction of facing tensile forces using the default Federal Highway Administration (FHWA) simplified equation is assessed in this study based on the Bayesian inference method and a large number of measured lower and upper bound facing tensile force data collected from the literature. Model uncertainty was quantified by model bias which is the ratio of measured to nominal facing tensile force. The Bayesian assessment was carried out assuming normal and lognormal distributions of model bias. Based on the collected facing tensile force data, it is shown that both the on-average accuracy and the spread in prediction accuracy of the default FHWA simplified facing tensile force equation depend largely upon the distribution assumptions. Two regression approaches were used to calibrate the default FHWA simplified facing tensile force equation for accuracy improvement. The Bayesian Information Criterion was adopted to quantitatively compare the rationality between the competing normal and lognormal statistical models that were intended for description of model bias. A case study is provided in the end to demonstrate both the importance of model uncertainty and the influence of distribution assumptions on model bias in reliability-based design of soil nail walls against facing flexural limit state.
- Conference Article
10
- 10.1109/ichi54592.2022.00035
- Jun 1, 2022
Drug review websites such as Drugs.com provide users' textual reviews and numeric ratings of drugs. These reviews along with the ratings are used for the consumers for choosing a drug. However, the numeric ratings may not always be consistent with text reviews and purely relying on the rating score for finding positive/negative reviews may not be reliable. Automatic classification of user ratings based on textual review can create a more reliable rating for drugs. In this project, we built classification models to classify drug review ratings using textual reviews with traditional machine learning and deep learning models. Traditional machine learning models including Random Forest and Naive Bayesian classifiers were built using TF-IDF features as input. Also, transformer-based neural network models including BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA, and ALBERT were built using the raw text as input. Overall, Bio_ClinicalBERT model outperformed the other models with an overall accuracy of 87%. We further identified concepts of the Unified Medical Language System (UMLS) from the postings and analyzed their semantic types stratified by class types. This research demonstrated that transformer-based models can be used to classify drug reviews based solely on textual reviews.
- Research Article
41
- 10.1139/cgj-2018-0386
- Aug 1, 2019
- Canadian Geotechnical Journal
This paper summarizes 239 static load tests to evaluate the performance of four static design methods for axial resistance of driven piles in clay. The methods are ISO 19901-4:2016, SHANSEP, ICP-05, and NGI-05. The database is categorized into four groups depending on the load type (compression or uplift) and pile tip condition (open or closed end). The model uncertainty in resistance prediction is quantified as a ratio between measured and calculated resistance, which is called a model factor. The measured resistance is interpreted as a load producing a settlement level of 10% pile diameter. Database studies show that the four methods present a similar accuracy, where the mean and coefficient of variation (COV) of the model factor are around 1 and 0.3, respectively. The COV values are smaller than those for driven piles in sand available in literature. The model statistics determined from the database are applicable to a simplified or full probabilistic form of reliability-based design (RBD) of driven piles in clay. As an illustration, the resistance factors in load and resistance factor design (LRFD, a simplified form of RBD) are calibrated by Monte Carlo simulations.
- Research Article
10
- 10.1016/j.jnnfm.2019.07.002
- Jul 10, 2019
- Journal of Non-Newtonian Fluid Mechanics
Uncertainty propagation in simulation predictions of generalized Newtonian fluid flows
- Conference Article
4
- 10.1109/bigdata52589.2021.9671854
- Dec 15, 2021
In this paper, we provide a sentiment analysis of conversations surrounding Covid-19 vaccine adoption on Twitter. We focus on key regions of the US, particularly urban areas with high African American populations. We utilize machine learning models such as logistic regression, Support Vector Machines, and Naive Bayes to provide baseline models. Furthermore, we develop fined-tuned Transformer-based language models that provide a classification of sentiments with high accuracy. The results from our analysis show that fine-tuning our dataset on a Transformer-based model, Covid-BERT v2, performs better than our baseline models however the accuracy is still relatively low. This might be as a result of the very limited training dataset. Future work will explore the use of a higher quality dataset and also evaluate other transformer-based models.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.