Optimizing Multi-Layer Perceptron Performance in Sentiment Classification through Neural Network Feature Extraction

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

There are some problems with using the Multi-Layer Perceptron (MLP) model for complex tasks because it can be hard to understand hierarchical relationships and tends to overfit data with a lot of dimensions. This research proposes an enhanced MLP model for sentiment classification by integrating feature extraction layers from advanced neural networks, specifically the Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (Bi-LSTM). These layers aim to improve the model's representation capabilities by capturing more nuanced features. To evaluate the performance improvements of this augmented MLP model, metrics such as accuracy, precision, recall, F1-score, and the Area Under the Curve for Receiver Operating Characteristics (ROC-AUC) were employed. A key metric focus is the delta value, representing changes in the ROC-AUC, to assess the significance of these enhancements. The integration of CNN as a feature extraction layer yielded optimal ROC-AUC results, achieving values of 93.30% and 93.00%, which reflect an improvement of 0.51% and 4.46% over the baseline model. These findings indicate that adding feature extraction layers significantly enhances MLP performance in sentiment classification tasks. Future research may explore the potential of using alternative neural networks as feature extractors to continue advancing MLP capabilities in complex NLP applications.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.cjph.2024.05.036
Convolutional and hybrid neural network for cluster membership
  • Jun 4, 2024
  • Chinese Journal of Physics
  • Yasuhiro Hashimoto + 1 more

Convolutional and hybrid neural network for cluster membership

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.procs.2024.04.002
Sentiment Analysis of Self Driving Car Dataset: A comparative study of Deep Learning approaches
  • Jan 1, 2024
  • Procedia Computer Science
  • Devshri Pandya + 1 more

Sentiment Analysis of Self Driving Car Dataset: A comparative study of Deep Learning approaches

  • Research Article
  • Cite Count Icon 36
  • 10.1016/j.apr.2023.101766
Graph convolutional network – Long short term memory neural network- multi layer perceptron- Gaussian progress regression model: A new deep learning model for predicting ozone concertation
  • Apr 18, 2023
  • Atmospheric Pollution Research
  • Mohammad Ehteram + 3 more

Graph convolutional network – Long short term memory neural network- multi layer perceptron- Gaussian progress regression model: A new deep learning model for predicting ozone concertation

  • Research Article
  • 10.3389/fspas.2025.1629056
Modeling ring current proton distribution using MLP, CNN, LSTM, and transformer networks
  • Oct 1, 2025
  • Frontiers in Astronomy and Space Sciences
  • Jinxing Li + 8 more

This study aims at developing ring current proton flux models using four neural network architectures: a multilayer perceptron (MLP), a convolutional neural network (CNN), a long short-term memory (LSTM) network, and a Transformer network. All models take time sequences of geomagnetic indices as inputs. Experimental results demonstrate that the LSTM and Transformer models consistently outperform the MLP and CNN models by achieving lower mean squared errors on the test set, possibly due to their intrinsic capability to process temporal sequential input data. Unlike MLP and CNN models, which require a fixed input history length even though proton lifetime varies with altitude, the LSTM and Transformer models accommodate variable-length sequences during both training and inference. Our findings indicate that the LSTM and Transformer architectures are well suited for modeling ring current proton behavior when GPU resources are available, and the Transformer slightly underperforms the LSTM model due to the restriction on the number of total heads. For resource-constrained environments, however, the MLP model offers a practical alternative, with faster training and inference times, while maintaining competitive accuracy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.1109/access.2020.3002346
Medicine Expenditure Prediction via a Variance- Based Generative Adversarial Network
  • Jan 1, 2020
  • IEEE Access
  • Shruti Kaushik + 4 more

Machine learning (ML) offers a wide range of techniques to predict medicine expenditures using historical expenditures data as well as other healthcare variables. For example, researchers have developed multilayer perceptron (MLP), long short-term memory (LSTM), and convolutional neural network (CNN) models for predicting healthcare outcomes. However, recently proposed generative approaches (e.g., generative adversarial networks; GANs) are yet to be explored for time-series prediction of medicine-related expenditures. The primary objective of this research was to develop and test a generative adversarial network model (called “variance-based GAN or V-GAN”) that specifically minimizes the difference in variance between model and actual data during model training. For our model development, we used patient expenditure data of a popular pain medication in the US. In the V-GAN model, we used an LSTM model as a generator network and a CNN model or an MLP model as a discriminator network. The V-GAN model's performance was compared with other GAN variants and ML models proposed in prior research such as linear regression (LR), gradient boosting regression (GBR), MLP, and LSTM. Results revealed that the V-GAN model using an LSTM generator and a CNN discriminator outperformed other GAN-based prediction models, as well as the LR, GBR, MLP, and LSTM models in correctly predicting medicine expenditures of patients. Through this research, we highlight the utility of developing GAN-based architectures involving variance minimization for predicting patient-related expenditures in the healthcare domain.

  • Research Article
  • Cite Count Icon 132
  • 10.3389/fdata.2020.00004
AI in Healthcare: Time-Series Forecasting Using Statistical, Neural, and Ensemble Architectures.
  • Mar 19, 2020
  • Frontiers in big data
  • Shruti Kaushik + 6 more

Both statistical and neural methods have been proposed in the literature to predict healthcare expenditures. However, less attention has been given to comparing predictions from both these methods as well as ensemble approaches in the healthcare domain. The primary objective of this paper was to evaluate different statistical, neural, and ensemble techniques in their ability to predict patients' weekly average expenditures on certain pain medications. Two statistical models, persistence (baseline) and autoregressive integrated moving average (ARIMA), a multilayer perceptron (MLP) model, a long short-term memory (LSTM) model, and an ensemble model combining predictions of the ARIMA, MLP, and LSTM models were calibrated to predict the expenditures on two different pain medications. In the MLP and LSTM models, we compared the influence of shuffling of training data and dropout of certain nodes in MLPs and nodes and recurrent connections in LSTMs in layers during training. Results revealed that the ensemble model outperformed the persistence, ARIMA, MLP, and LSTM models across both pain medications. In general, not shuffling the training data and adding the dropout helped the MLP models and shuffling the training data and not adding the dropout helped the LSTM models across both medications. We highlight the implications of using statistical, neural, and ensemble methods for time-series forecasting of outcomes in the healthcare domain.

  • Research Article
  • Cite Count Icon 1
  • 10.12928/biste.v5i4.9668
Comparative Analysis of MLP, CNN, and RNN Models in Automatic Speech Recognition: Dissecting Performance Metric
  • Jan 8, 2024
  • Buletin Ilmiah Sarjana Teknik Elektro
  • Abraham K S Lenson + 1 more

This study conducts a comparative analysis of three prominent machine learning models: Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) with Long Short-Term Memory (LSTM) in the field of automatic speech recognition (ASR). This research is distinct in its use of the LibriSpeech 'test-clean' dataset, selected for its diversity in speaker accents and varied recording conditions, establishing it as a robust benchmark for ASR performance evaluation. Our approach involved preprocessing the audio data to ensure consistency and extracting Mel-Frequency Cepstral Coefficients (MFCCs) as the primary features, crucial for capturing the nuances of human speech. The models were meticulously configured with specific architectural details and hyperparameters. The MLP and CNN models were designed to maximize their pattern recognition capabilities, while the RNN (LSTM) was optimized for processing temporal data. To assess their performance, we employed metrics such as precision, recall, and F1-score. The MLP and CNN models demonstrated exceptional accuracy, with scores of 0.98 across these metrics, indicating their effectiveness in feature extraction and pattern recognition. In contrast, the LSTM variant of RNN showed lower efficacy, with scores below 0.60, highlighting the challenges in handling sequential speech data. The results of this study shed light on the differing capabilities of these models in ASR. While the high accuracy of MLP and CNN suggests potential overfitting, the underperformance of LSTM underscores the necessity for further refinement in sequential data processing. This research contributes to the understanding of various machine learning approaches in ASR and paves the way for future investigations. We propose exploring hybrid model architectures and enhancing feature extraction methods to develop more sophisticated, real-world ASR systems. Additionally, our findings underscore the importance of considering model-specific strengths and limitations in ASR applications, guiding the direction of future research in this rapidly evolving field.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 35
  • 10.3390/app121910156
Human Posture Detection Using Image Augmentation and Hyperparameter-Optimized Transfer Learning Algorithms
  • Oct 10, 2022
  • Applied Sciences
  • Roseline Oluwaseun Ogundokun + 2 more

With the advancement in pose estimation techniques, human posture detection recently received considerable attention in many applications, including ergonomics and healthcare. When using neural network models, overfitting and poor performance are prevalent issues. Recently, convolutional neural networks (CNNs) were successfully used for human posture recognition from human images due to their superior multiscale high-level visual representations over hand-engineering low-level characteristics. However, calculating millions of parameters in a deep CNN requires a significant number of annotated examples, which prohibits many deep CNNs such as AlexNet and VGG16 from being used on issues with minimal training data. We propose a new three-phase model for decision support that integrates CNN transfer learning, image data augmentation, and hyperparameter optimization (HPO) to address this problem. The model is used as part of a new decision support framework for the optimization of hyperparameters for AlexNet, VGG16, CNN, and multilayer perceptron (MLP) models for accomplishing optimal classification results. The AlexNet and VGG16 transfer learning algorithms with HPO are used for human posture detection, while CNN and Multilayer Perceptron (MLP) were used as standard classifiers for contrast. The HPO methods are essential for machine learning and deep learning algorithms because they directly influence the behaviors of training algorithms and have a major impact on the performance of machine learning and deep learning models. We used an image data augmentation technique to increase the number of images to be used for model training to reduce model overfitting and improve classification performance using the AlexNet, VGG16, CNN, and MLP models. The optimal combination of hyperparameters was found for the four models using a random-based search strategy. The MPII human posture datasets were used to test the proposed approach. The proposed models achieved an accuracy of 91.2% using AlexNet, 90.2% using VGG16, 87.5% using CNN, and 89.9% using MLP. The study is the first HPO study executed on the MPII human pose dataset.

  • Research Article
  • 10.1177/20552076251393338
Ensemble learning for improved sentiment analysis in doctor–patient communication
  • Oct 30, 2025
  • Digital Health
  • Yufan Ge + 3 more

ObjectiveTo fill the benchmarking gap in clinician–patient sentiment analysis, we compare deep learning, transformer, and ensemble models for three-class (low/medium/high) sentiment classification in doctor–patient consultations.MethodsWe used a publicly available dataset of 3325 anonymized doctor–patient consultations from the Hugging Face repository (mahfoos/Patient-Doctor-Conversation) labeled as low, medium, or high severity. Preprocessing included text cleaning, tokenization, and padding; class balancing was applied only within the training split of each fold. Models evaluated were long short-term memory (LSTM), bidirectional LSTM (BiLSTM), convolutional neural networks (CNN), CNN–LSTM, and bidirectional encoder representations from transformers (BERT); an ensemble (hard voting over Logistic Regression, Random Forest, and Support Vector Classifier (SVC)) was also tested. Evaluation used stratified five-fold cross-validation, with metrics reported as mean ± SD across outer test folds (accuracy; macro-averaged precision/recall/F1). Interpretability was examined via BERT attention and feature attributions.ResultsThe ensemble achieved the highest accuracy (75.5 ± 0.5), outperforming BERT (66.98 ± 0.6), CNN–LSTM (65.68 ± 0.9), CNN (64.17 ± 0.8), BiLSTM (64.82 ± 0.7), and LSTM (58.66 ± 0.19). Class-wise analysis showed robust detection of high-severity interactions (e.g. ensemble F1 = 90.8 ± 1.3), while low-severity remained most challenging; the ensemble improved class 0 recall (58.7 ± 1.0), and BERT provided the highest class 0 precision (65.5 ± 1.0).ConclusionUnder stratified five-fold cross-validation, ensemble learning delivered the strongest and most balanced performance for three-class sentiment classification of clinician–patient dialogue, while transformers offered complementary precision on difficult cases. Attention- and feature-attribution analyses improved transparency, supporting clinical interpretability. Future work should scale to larger, multimodal (text/audio/vision) and multilingual datasets, and develop privacy-preserving, lightweight models for real-time deployment in clinical settings.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 38
  • 10.1155/2021/5360828
A Stock Closing Price Prediction Model Based on CNN‐BiSLSTM
  • Jan 1, 2021
  • Complexity
  • Haiyao Wang + 5 more

As the stock market is an important part of the national economy, more and more investors have begun to pay attention to the methods to improve the return on investment and effectively avoid certain risks. Many factors affect the trend of the stock market, and the relevant information has the nature of time series. This paper proposes a composite model CNN‐BiSLSTM to predict the closing price of the stock. Bidirectional special long short‐term memory (BiSLSTM) improved on bidirectional long short‐term memory (BiLSTM) adds 1 − tanh(x) function in the output gate which makes the model better predict the stock price. The model extracts advanced features that influence stock price through convolutional neural network (CNN), and predicts the stock closing price through BiSLSTM after the data processed by CNN. To verify the effectiveness of the model, the historical data of the Shenzhen Component Index from July 1, 1991, to October 30, 2020, are used to train and test the CNN‐BiSLSTM. CNN‐BiSLSTM is compared with multilayer perceptron (MLP), recurrent neural network (RNN), long short‐term memory (LSTM), BiLSTM, CNN‐LSTM, and CNN‐BiLSTM. The experimental results show that the mean absolute error (MAE), root‐mean‐squared error (RMSE), and R‐square (R2) evaluation indicators of the CNN‐BiSLSTM are all optimal. Therefore, CNN‐BiSLSTM can accurately predict the closing price of the Shenzhen Component Index of the next trading day, which can be used as a reference for the majority of investors to effectively avoid certain risks.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 16
  • 10.3390/w7062707
Spatial Disaggregation of Areal Rainfall Using Two Different Artificial Neural Networks Models
  • Jun 5, 2015
  • Water
  • Sungwon Kim + 1 more

The objective of this study is to develop artificial neural network (ANN) models, including multilayer perceptron (MLP) and Kohonen self-organizing feature map (KSOFM), for spatial disaggregation of areal rainfall in the Wi-stream catchment, an International Hydrological Program (IHP) representative catchment, in South Korea. A three-layer MLP model, using three training algorithms, was used to estimate areal rainfall. The Levenberg–Marquardt training algorithm was found to be more sensitive to the number of hidden nodes than were the conjugate gradient and quickprop training algorithms using the MLP model. Results showed that the networks structures of 11-5-1 (conjugate gradient and quickprop) and 11-3-1 (Levenberg-Marquardt) were the best for estimating areal rainfall using the MLP model. The networks structures of 1-5-11 (conjugate gradient and quickprop) and 1-3-11 (Levenberg–Marquardt), which are the inverse networks for estimating areal rainfall using the best MLP model, were identified for spatial disaggregation of areal rainfall using the MLP model. The KSOFM model was compared with the MLP model for spatial disaggregation of areal rainfall. The MLP and KSOFM models could disaggregate areal rainfall into individual point rainfall with spatial concepts.

  • Research Article
  • Cite Count Icon 2
  • 10.1002/htj.23163
Utilizing multilayer perceptron for machine learning diagnosis in phase change material‐based thermal management systems
  • Aug 27, 2024
  • Heat Transfer
  • Abdul Arif + 3 more

Electric vehicles encounter significant challenges in colder climates due to reduced battery efficiency at low temperatures and increased electricity demand for cabin heating, which impacts vehicle propulsion. This study aims to address these challenges by implementing a thermal management system utilizing Phase Change Materials (PCMs) and validating the performance of a Multilayer Perceptron (MLP) model in predicting PCMs behavior and battery temperature distributions. The study employs an MLP model trained with 160 samples of diverse heat inputs, including pulsating, constant, wiener, discharging, and random temperatures. The model uses these temperatures as inputs and liquid fractions as target values. Performance evaluation is conducted using the MATLAB platform and is benchmarked against existing approaches, such as Long Short‐term Memory (LSTM), spatiotemporal convolutional neural network (CNN), and pooled CNN‐LSTM. The MLP model's accuracy in predicting PCMs phase transitions is validated by comparing predicted liquid fractions with numerically obtained values. Additionally, this study forecasts temperature distributions within a standard battery pack under various discharge scenarios, considering the performance of commercial lithium‐ion batteries. The proposed MLP model demonstrates high efficacy, achieving a correlation of up to 0.999 and root mean squared error below 0.013 compared with numerical results.

  • Research Article
  • Cite Count Icon 14
  • 10.1007/s00024-015-1065-2
Multilayer Perceptron Model for Nowcasting Visibility from Surface Observations: Results and Sensitivity to Dissimilar Station Altitudes
  • Mar 28, 2015
  • Pure and Applied Geophysics
  • Sutapa Chaudhuri + 3 more

The reduction in the visibility during fog significantly influences surface as well as air transport operations. The prediction of fog remains difficult despite improvements in numerical weather prediction models. The present study aims at identifying a suitable neural network model with proper architecture to provide precise nowcast of the horizontal visibility during fog over the airports of three significantly affected metropolises of India, namely: Kolkata (22°32′N; 88°20′E), Delhi (28°38′N; 77°12′E) and Bengaluru (12°95′N; 77°72′E). The investigation shows that the multilayer perceptron (MLP) model provides considerably less error in nowcasting the visibility during fog over the said metropolises than radial basis function network, generalized regression neural network or linear neural network. The MLP models of different architectures are trained with the data and records from 2000 to 2010. The model results are validated with observations from 2011 to 2014. Our results reveal that MLP models with different configurations (1) four input layers, three hidden layers with three hidden nodes in each layer and a single output; (2) four input layers with two hidden layers having one hidden node in the first hidden layer and two hidden nodes in the second hidden layer, and a single output layer; and (3) four input layers with two hidden layers having two hidden nodes in each hidden layer and a single output layer] provide minimum error in nowcasting the visibility during fog over the airports of Kolkata, Delhi and Bengaluru, respectively. The results show that the MLP model is well suited for nowcasting visibility during fog with 6 h lead time, however, the study reveals that the MLP model sensitive to dissimilar station altitudes in nowcasting visibility, as the minimum prediction error for the three metropolises having dissimilar mean sea level altitudes is observed through different configurations of the model.

  • Book Chapter
  • Cite Count Icon 11
  • 10.1007/978-3-319-57454-7_41
A P-LSTM Neural Network for Sentiment Classification
  • Jan 1, 2017
  • Chi Lu + 4 more

Neural network models have been demonstrated to be capable of achieving remarkable performance in sentiment classification. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modelling task. In this work, a novel model based on long short-term memory recurrent neural network (LSTM) called P-LSTM is proposed for sentiment classification. In P-LSTM, three-words phrase embedding is used instead of single word embedding as is often done. Besides, P-LSTM introduces the phrase factor mechanism which combines the feature vectors of the phrase embedding layer and the LSTM hidden layer to extract more exact information from the text. The experimental results show that the P-LSTM achieves excellent performance on the sentiment classification tasks.

  • Research Article
  • Cite Count Icon 3
  • 10.18280/ria.340418
Text Sentiment Classification Based on Feature Fusion
  • Sep 30, 2020
  • Revue d'Intelligence Artificielle
  • Chen Zhang + 2 more

The convolutional neural network (CNN) and long short-term memory (LSTM) network are adept at extracting local and global features, respectively. Both can achieve excellent classification effects. However, the CNN performs poorly in extracting the global contextual information of the text, while LSTM often overlooks the features hidden between words. For text sentiment classification, this paper combines the CNN with bidirectional LSTM (BiLSTM) into a parallel hybrid model called CNN_BiLSTM. Firstly, the CNN was adopted to extract the local features of the text quickly. Next, the BiLSTM was employed to obtain the global text features containing contextual semantics. After that, the features extracted by the two neural networks (NNs) were fused, and processed by Softmax classifier for text sentiment classification. To verify its performance, the CNN_BiLSTM was compared with single NNs like CNN and LSTM, as well as other deep learning (DL) NNs through experiments. The experimental results show that the proposed parallel hybrid model outperformed the contrastive methods in F1-score and accuracy. Therefore, our model can solve text sentiment classification tasks effectively, and boast better practical value than other NNs.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.