Optimizing Multi-Layer Perceptron Performance in Sentiment Classification through Neural Network Feature Extraction
There are some problems with using the Multi-Layer Perceptron (MLP) model for complex tasks because it can be hard to understand hierarchical relationships and tends to overfit data with a lot of dimensions. This research proposes an enhanced MLP model for sentiment classification by integrating feature extraction layers from advanced neural networks, specifically the Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (Bi-LSTM). These layers aim to improve the model's representation capabilities by capturing more nuanced features. To evaluate the performance improvements of this augmented MLP model, metrics such as accuracy, precision, recall, F1-score, and the Area Under the Curve for Receiver Operating Characteristics (ROC-AUC) were employed. A key metric focus is the delta value, representing changes in the ROC-AUC, to assess the significance of these enhancements. The integration of CNN as a feature extraction layer yielded optimal ROC-AUC results, achieving values of 93.30% and 93.00%, which reflect an improvement of 0.51% and 4.46% over the baseline model. These findings indicate that adding feature extraction layers significantly enhances MLP performance in sentiment classification tasks. Future research may explore the potential of using alternative neural networks as feature extractors to continue advancing MLP capabilities in complex NLP applications.
- # Multi-Layer Perceptron Model
- # Performance In Sentiment Classification
- # Feature Extraction Layers
- # Multi-Layer Perceptron
- # Sentiment Classification
- # Bidirectional Long Short-Term Memory
- # Sentiment Classification Tasks
- # Alternative Neural Networks
- # Advanced Neural Networks
- # Curve For Receiver Operating Characteristics
- Research Article
1
- 10.1016/j.cjph.2024.05.036
- Jun 4, 2024
- Chinese Journal of Physics
Convolutional and hybrid neural network for cluster membership
- Research Article
3
- 10.1016/j.procs.2024.04.002
- Jan 1, 2024
- Procedia Computer Science
Sentiment Analysis of Self Driving Car Dataset: A comparative study of Deep Learning approaches
- Research Article
36
- 10.1016/j.apr.2023.101766
- Apr 18, 2023
- Atmospheric Pollution Research
Graph convolutional network – Long short term memory neural network- multi layer perceptron- Gaussian progress regression model: A new deep learning model for predicting ozone concertation
- Research Article
- 10.3389/fspas.2025.1629056
- Oct 1, 2025
- Frontiers in Astronomy and Space Sciences
This study aims at developing ring current proton flux models using four neural network architectures: a multilayer perceptron (MLP), a convolutional neural network (CNN), a long short-term memory (LSTM) network, and a Transformer network. All models take time sequences of geomagnetic indices as inputs. Experimental results demonstrate that the LSTM and Transformer models consistently outperform the MLP and CNN models by achieving lower mean squared errors on the test set, possibly due to their intrinsic capability to process temporal sequential input data. Unlike MLP and CNN models, which require a fixed input history length even though proton lifetime varies with altitude, the LSTM and Transformer models accommodate variable-length sequences during both training and inference. Our findings indicate that the LSTM and Transformer architectures are well suited for modeling ring current proton behavior when GPU resources are available, and the Transformer slightly underperforms the LSTM model due to the restriction on the number of total heads. For resource-constrained environments, however, the MLP model offers a practical alternative, with faster training and inference times, while maintaining competitive accuracy.
- Research Article
15
- 10.1109/access.2020.3002346
- Jan 1, 2020
- IEEE Access
Machine learning (ML) offers a wide range of techniques to predict medicine expenditures using historical expenditures data as well as other healthcare variables. For example, researchers have developed multilayer perceptron (MLP), long short-term memory (LSTM), and convolutional neural network (CNN) models for predicting healthcare outcomes. However, recently proposed generative approaches (e.g., generative adversarial networks; GANs) are yet to be explored for time-series prediction of medicine-related expenditures. The primary objective of this research was to develop and test a generative adversarial network model (called “variance-based GAN or V-GAN”) that specifically minimizes the difference in variance between model and actual data during model training. For our model development, we used patient expenditure data of a popular pain medication in the US. In the V-GAN model, we used an LSTM model as a generator network and a CNN model or an MLP model as a discriminator network. The V-GAN model's performance was compared with other GAN variants and ML models proposed in prior research such as linear regression (LR), gradient boosting regression (GBR), MLP, and LSTM. Results revealed that the V-GAN model using an LSTM generator and a CNN discriminator outperformed other GAN-based prediction models, as well as the LR, GBR, MLP, and LSTM models in correctly predicting medicine expenditures of patients. Through this research, we highlight the utility of developing GAN-based architectures involving variance minimization for predicting patient-related expenditures in the healthcare domain.
- Research Article
132
- 10.3389/fdata.2020.00004
- Mar 19, 2020
- Frontiers in big data
Both statistical and neural methods have been proposed in the literature to predict healthcare expenditures. However, less attention has been given to comparing predictions from both these methods as well as ensemble approaches in the healthcare domain. The primary objective of this paper was to evaluate different statistical, neural, and ensemble techniques in their ability to predict patients' weekly average expenditures on certain pain medications. Two statistical models, persistence (baseline) and autoregressive integrated moving average (ARIMA), a multilayer perceptron (MLP) model, a long short-term memory (LSTM) model, and an ensemble model combining predictions of the ARIMA, MLP, and LSTM models were calibrated to predict the expenditures on two different pain medications. In the MLP and LSTM models, we compared the influence of shuffling of training data and dropout of certain nodes in MLPs and nodes and recurrent connections in LSTMs in layers during training. Results revealed that the ensemble model outperformed the persistence, ARIMA, MLP, and LSTM models across both pain medications. In general, not shuffling the training data and adding the dropout helped the MLP models and shuffling the training data and not adding the dropout helped the LSTM models across both medications. We highlight the implications of using statistical, neural, and ensemble methods for time-series forecasting of outcomes in the healthcare domain.
- Research Article
1
- 10.12928/biste.v5i4.9668
- Jan 8, 2024
- Buletin Ilmiah Sarjana Teknik Elektro
This study conducts a comparative analysis of three prominent machine learning models: Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN) with Long Short-Term Memory (LSTM) in the field of automatic speech recognition (ASR). This research is distinct in its use of the LibriSpeech 'test-clean' dataset, selected for its diversity in speaker accents and varied recording conditions, establishing it as a robust benchmark for ASR performance evaluation. Our approach involved preprocessing the audio data to ensure consistency and extracting Mel-Frequency Cepstral Coefficients (MFCCs) as the primary features, crucial for capturing the nuances of human speech. The models were meticulously configured with specific architectural details and hyperparameters. The MLP and CNN models were designed to maximize their pattern recognition capabilities, while the RNN (LSTM) was optimized for processing temporal data. To assess their performance, we employed metrics such as precision, recall, and F1-score. The MLP and CNN models demonstrated exceptional accuracy, with scores of 0.98 across these metrics, indicating their effectiveness in feature extraction and pattern recognition. In contrast, the LSTM variant of RNN showed lower efficacy, with scores below 0.60, highlighting the challenges in handling sequential speech data. The results of this study shed light on the differing capabilities of these models in ASR. While the high accuracy of MLP and CNN suggests potential overfitting, the underperformance of LSTM underscores the necessity for further refinement in sequential data processing. This research contributes to the understanding of various machine learning approaches in ASR and paves the way for future investigations. We propose exploring hybrid model architectures and enhancing feature extraction methods to develop more sophisticated, real-world ASR systems. Additionally, our findings underscore the importance of considering model-specific strengths and limitations in ASR applications, guiding the direction of future research in this rapidly evolving field.
- Research Article
35
- 10.3390/app121910156
- Oct 10, 2022
- Applied Sciences
With the advancement in pose estimation techniques, human posture detection recently received considerable attention in many applications, including ergonomics and healthcare. When using neural network models, overfitting and poor performance are prevalent issues. Recently, convolutional neural networks (CNNs) were successfully used for human posture recognition from human images due to their superior multiscale high-level visual representations over hand-engineering low-level characteristics. However, calculating millions of parameters in a deep CNN requires a significant number of annotated examples, which prohibits many deep CNNs such as AlexNet and VGG16 from being used on issues with minimal training data. We propose a new three-phase model for decision support that integrates CNN transfer learning, image data augmentation, and hyperparameter optimization (HPO) to address this problem. The model is used as part of a new decision support framework for the optimization of hyperparameters for AlexNet, VGG16, CNN, and multilayer perceptron (MLP) models for accomplishing optimal classification results. The AlexNet and VGG16 transfer learning algorithms with HPO are used for human posture detection, while CNN and Multilayer Perceptron (MLP) were used as standard classifiers for contrast. The HPO methods are essential for machine learning and deep learning algorithms because they directly influence the behaviors of training algorithms and have a major impact on the performance of machine learning and deep learning models. We used an image data augmentation technique to increase the number of images to be used for model training to reduce model overfitting and improve classification performance using the AlexNet, VGG16, CNN, and MLP models. The optimal combination of hyperparameters was found for the four models using a random-based search strategy. The MPII human posture datasets were used to test the proposed approach. The proposed models achieved an accuracy of 91.2% using AlexNet, 90.2% using VGG16, 87.5% using CNN, and 89.9% using MLP. The study is the first HPO study executed on the MPII human pose dataset.
- Research Article
- 10.1177/20552076251393338
- Oct 30, 2025
- Digital Health
ObjectiveTo fill the benchmarking gap in clinician–patient sentiment analysis, we compare deep learning, transformer, and ensemble models for three-class (low/medium/high) sentiment classification in doctor–patient consultations.MethodsWe used a publicly available dataset of 3325 anonymized doctor–patient consultations from the Hugging Face repository (mahfoos/Patient-Doctor-Conversation) labeled as low, medium, or high severity. Preprocessing included text cleaning, tokenization, and padding; class balancing was applied only within the training split of each fold. Models evaluated were long short-term memory (LSTM), bidirectional LSTM (BiLSTM), convolutional neural networks (CNN), CNN–LSTM, and bidirectional encoder representations from transformers (BERT); an ensemble (hard voting over Logistic Regression, Random Forest, and Support Vector Classifier (SVC)) was also tested. Evaluation used stratified five-fold cross-validation, with metrics reported as mean ± SD across outer test folds (accuracy; macro-averaged precision/recall/F1). Interpretability was examined via BERT attention and feature attributions.ResultsThe ensemble achieved the highest accuracy (75.5 ± 0.5), outperforming BERT (66.98 ± 0.6), CNN–LSTM (65.68 ± 0.9), CNN (64.17 ± 0.8), BiLSTM (64.82 ± 0.7), and LSTM (58.66 ± 0.19). Class-wise analysis showed robust detection of high-severity interactions (e.g. ensemble F1 = 90.8 ± 1.3), while low-severity remained most challenging; the ensemble improved class 0 recall (58.7 ± 1.0), and BERT provided the highest class 0 precision (65.5 ± 1.0).ConclusionUnder stratified five-fold cross-validation, ensemble learning delivered the strongest and most balanced performance for three-class sentiment classification of clinician–patient dialogue, while transformers offered complementary precision on difficult cases. Attention- and feature-attribution analyses improved transparency, supporting clinical interpretability. Future work should scale to larger, multimodal (text/audio/vision) and multilingual datasets, and develop privacy-preserving, lightweight models for real-time deployment in clinical settings.
- Research Article
38
- 10.1155/2021/5360828
- Jan 1, 2021
- Complexity
As the stock market is an important part of the national economy, more and more investors have begun to pay attention to the methods to improve the return on investment and effectively avoid certain risks. Many factors affect the trend of the stock market, and the relevant information has the nature of time series. This paper proposes a composite model CNN‐BiSLSTM to predict the closing price of the stock. Bidirectional special long short‐term memory (BiSLSTM) improved on bidirectional long short‐term memory (BiLSTM) adds 1 − tanh(x) function in the output gate which makes the model better predict the stock price. The model extracts advanced features that influence stock price through convolutional neural network (CNN), and predicts the stock closing price through BiSLSTM after the data processed by CNN. To verify the effectiveness of the model, the historical data of the Shenzhen Component Index from July 1, 1991, to October 30, 2020, are used to train and test the CNN‐BiSLSTM. CNN‐BiSLSTM is compared with multilayer perceptron (MLP), recurrent neural network (RNN), long short‐term memory (LSTM), BiLSTM, CNN‐LSTM, and CNN‐BiLSTM. The experimental results show that the mean absolute error (MAE), root‐mean‐squared error (RMSE), and R‐square (R2) evaluation indicators of the CNN‐BiSLSTM are all optimal. Therefore, CNN‐BiSLSTM can accurately predict the closing price of the Shenzhen Component Index of the next trading day, which can be used as a reference for the majority of investors to effectively avoid certain risks.
- Research Article
16
- 10.3390/w7062707
- Jun 5, 2015
- Water
The objective of this study is to develop artificial neural network (ANN) models, including multilayer perceptron (MLP) and Kohonen self-organizing feature map (KSOFM), for spatial disaggregation of areal rainfall in the Wi-stream catchment, an International Hydrological Program (IHP) representative catchment, in South Korea. A three-layer MLP model, using three training algorithms, was used to estimate areal rainfall. The Levenberg–Marquardt training algorithm was found to be more sensitive to the number of hidden nodes than were the conjugate gradient and quickprop training algorithms using the MLP model. Results showed that the networks structures of 11-5-1 (conjugate gradient and quickprop) and 11-3-1 (Levenberg-Marquardt) were the best for estimating areal rainfall using the MLP model. The networks structures of 1-5-11 (conjugate gradient and quickprop) and 1-3-11 (Levenberg–Marquardt), which are the inverse networks for estimating areal rainfall using the best MLP model, were identified for spatial disaggregation of areal rainfall using the MLP model. The KSOFM model was compared with the MLP model for spatial disaggregation of areal rainfall. The MLP and KSOFM models could disaggregate areal rainfall into individual point rainfall with spatial concepts.
- Research Article
2
- 10.1002/htj.23163
- Aug 27, 2024
- Heat Transfer
Electric vehicles encounter significant challenges in colder climates due to reduced battery efficiency at low temperatures and increased electricity demand for cabin heating, which impacts vehicle propulsion. This study aims to address these challenges by implementing a thermal management system utilizing Phase Change Materials (PCMs) and validating the performance of a Multilayer Perceptron (MLP) model in predicting PCMs behavior and battery temperature distributions. The study employs an MLP model trained with 160 samples of diverse heat inputs, including pulsating, constant, wiener, discharging, and random temperatures. The model uses these temperatures as inputs and liquid fractions as target values. Performance evaluation is conducted using the MATLAB platform and is benchmarked against existing approaches, such as Long Short‐term Memory (LSTM), spatiotemporal convolutional neural network (CNN), and pooled CNN‐LSTM. The MLP model's accuracy in predicting PCMs phase transitions is validated by comparing predicted liquid fractions with numerically obtained values. Additionally, this study forecasts temperature distributions within a standard battery pack under various discharge scenarios, considering the performance of commercial lithium‐ion batteries. The proposed MLP model demonstrates high efficacy, achieving a correlation of up to 0.999 and root mean squared error below 0.013 compared with numerical results.
- Research Article
14
- 10.1007/s00024-015-1065-2
- Mar 28, 2015
- Pure and Applied Geophysics
The reduction in the visibility during fog significantly influences surface as well as air transport operations. The prediction of fog remains difficult despite improvements in numerical weather prediction models. The present study aims at identifying a suitable neural network model with proper architecture to provide precise nowcast of the horizontal visibility during fog over the airports of three significantly affected metropolises of India, namely: Kolkata (22°32′N; 88°20′E), Delhi (28°38′N; 77°12′E) and Bengaluru (12°95′N; 77°72′E). The investigation shows that the multilayer perceptron (MLP) model provides considerably less error in nowcasting the visibility during fog over the said metropolises than radial basis function network, generalized regression neural network or linear neural network. The MLP models of different architectures are trained with the data and records from 2000 to 2010. The model results are validated with observations from 2011 to 2014. Our results reveal that MLP models with different configurations (1) four input layers, three hidden layers with three hidden nodes in each layer and a single output; (2) four input layers with two hidden layers having one hidden node in the first hidden layer and two hidden nodes in the second hidden layer, and a single output layer; and (3) four input layers with two hidden layers having two hidden nodes in each hidden layer and a single output layer] provide minimum error in nowcasting the visibility during fog over the airports of Kolkata, Delhi and Bengaluru, respectively. The results show that the MLP model is well suited for nowcasting visibility during fog with 6 h lead time, however, the study reveals that the MLP model sensitive to dissimilar station altitudes in nowcasting visibility, as the minimum prediction error for the three metropolises having dissimilar mean sea level altitudes is observed through different configurations of the model.
- Book Chapter
11
- 10.1007/978-3-319-57454-7_41
- Jan 1, 2017
Neural network models have been demonstrated to be capable of achieving remarkable performance in sentiment classification. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modelling task. In this work, a novel model based on long short-term memory recurrent neural network (LSTM) called P-LSTM is proposed for sentiment classification. In P-LSTM, three-words phrase embedding is used instead of single word embedding as is often done. Besides, P-LSTM introduces the phrase factor mechanism which combines the feature vectors of the phrase embedding layer and the LSTM hidden layer to extract more exact information from the text. The experimental results show that the P-LSTM achieves excellent performance on the sentiment classification tasks.
- Research Article
3
- 10.18280/ria.340418
- Sep 30, 2020
- Revue d'Intelligence Artificielle
The convolutional neural network (CNN) and long short-term memory (LSTM) network are adept at extracting local and global features, respectively. Both can achieve excellent classification effects. However, the CNN performs poorly in extracting the global contextual information of the text, while LSTM often overlooks the features hidden between words. For text sentiment classification, this paper combines the CNN with bidirectional LSTM (BiLSTM) into a parallel hybrid model called CNN_BiLSTM. Firstly, the CNN was adopted to extract the local features of the text quickly. Next, the BiLSTM was employed to obtain the global text features containing contextual semantics. After that, the features extracted by the two neural networks (NNs) were fused, and processed by Softmax classifier for text sentiment classification. To verify its performance, the CNN_BiLSTM was compared with single NNs like CNN and LSTM, as well as other deep learning (DL) NNs through experiments. The experimental results show that the proposed parallel hybrid model outperformed the contrastive methods in F1-score and accuracy. Therefore, our model can solve text sentiment classification tasks effectively, and boast better practical value than other NNs.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.