Continuous Smartphone Authentication via Multimodal Biometrics and Optimized Ensemble Learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

The ubiquity of smartphones has transformed them into primary repositories of sensitive data; however, traditional one-time authentication mechanisms create a critical trust gap by failing to verify identity post-unlock. Our aim is to mitigate these vulnerabilities and align with the Zero Trust Architecture (ZTA) framework and philosophy of “never trust, always verify,” as formally defined by the National Institute of Standards and Technology (NIST) in Special Publication 800-207. This study introduces a robust continuous authentication (CA) framework leveraging multimodal behavioral biometrics. A dedicated application was developed to synchronously capture touch, sliding, and inertial sensor telemetry. For feature modeling, a heterogeneous deep learning pipeline was employed to capture modality-specific characteristics, utilizing Convolutional Neural Networks (CNNs) for sensor data, Long Short-Term Memory (LSTM) networks for curvilinear sliding, and Gated Recurrent Units (GRUs) for discrete touch. To resolve performance degradation caused by class imbalance in Zero Trust environments, a Grid Search Optimization (GSO) strategy was applied to optimize a weighted voting ensemble, identifying the global optimum for decision thresholds and modality weights. Empirical validation on a dataset of 35,519 samples from 15 subjects demonstrates that the optimized ensemble achieves a peak accuracy of 99.23%. Sensor kinematics emerged as the primary biometric signature, followed by touch and sliding features. This framework enables high-precision, non-intrusive continuous verification, bridging the critical security gap in contemporary mobile architectures.

Similar Papers
  • Conference Article
  • 10.21437/iberspeech.2018-36
Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
  • Nov 21, 2018
  • Alicia Lozano-Diez + 2 more

Automatic speech recognition has experienced a breathtaking progress in the last few years, partially thanks to the introduction of deep neural networks into their approaches. This evolution in speech recognition systems has spread across related areas such as language and speaker recognition, where deep neural networks have noticeably improved their performance. In this PhD thesis, we have explored different approaches to the tasks of speaker and language recognition, focusing on systems where deep neural networks become part of traditional pipelines, replacing some stages or the whole system itself. Specifically, in the first experimental block, end-to-end language recognition systems based on deep neural networks are analyzed, where the neural network is used as classifier directly, without the use of any other backend but performing the language recognition task from the scores (posterior probabilities) provided by the network. Besides, these research works are focused on two architectures, convolutional neural networks and long short-term memory (LSTM) recurrent neural networks, which are less demanding in terms of computational resources due to the reduced amount of free parameters in comparison with other deep neural networks. Thus, these systems constitute an alternative to classical i-vectors, and achieve comparable results to them, especially when dealing with short utterances. In particular, we conducted experiments comparing a system based on convolutional neural networks with classical Factor Analysis GMM and i-vector reference systems, and evaluate them on two different tasks from the National Institute of Standards and Technology (NIST) Language Recognition Evaluation (LRE) 2009: one focused on language-pairs and the other, on multi-class language identification. Results shown comparable performance of the convolutional neural network based approaches and some improvements are achieved when fusing both classical and neural network approaches. We also present the experiments performed with LSTM recurrent neural networks, which have proven their ability to model time depending sequences. We evaluate our LSTM-based language recognition systems on different subsets of the NIST LRE 2009 and 2015, where LSTM systems are able to outperform the reference i-vector system, providing a model with less parameters, although more prone to overfitting and not able to generalize as well as i-vector in mismatched datasets. In the second experimental block of this Dissertation, we explore one of the most prominent applications of deep neural networks in speech processing, which is their use as feature extractors. In this kind of systems, deep neural networks are used to obtain a frame-by-frame representation of the speech signal, the so-called bottleneck feature vector, which is learned directly by the network and is then used instead of traditional acoustic features as input in language and speaker recognition systems based on i-vectors. This approach revolutionized these two fields, since they highly outperformed classical systems which had been state-of-the-art for many year (i-vector based on acoustic features). Our analysis focuses on how different configurations of the neural network used as bottleneck feature extractor, and which is trained for automatic speech recognition, influences performance of resulting features for language and speaker recognition. For the case of language recognition, we compare bottleneck features from networks that vary their depth in terms of number of hidden layers, the position of the bottleneck layer where it comprises the information and the number of units (size) of this layer, which would influence the representation obtained by the network. With the set of experiments performed on bottleneck features for speaker recognition, we analyzed the influence of the type of features used to feed the network, their pre-processing and, in general, the optimization of the network for the task of feature extraction for speaker recognition, which might not mean the optimal configuration for ASR. Finally, the third experimental block of this Thesis proposes a novel approach for language recognition, in which the neural network is used to extract a fixed-length utterance-level representation of speech segments known as embedding, able to replace the classical i-vector, and overcoming the variable length sequence of feature provided by the bottleneck features. This embedding based approach has recently shown promising results for speaker verification tasks, and our proposed system was able to outperform a strong state-of-the-art reference i-vector system on the last challenging language recognition evaluations organized by NIST in 2015 and 2017. Thus, we analyze language recognition systems based on embeddings, and explore different deep neural network architectures and data augmentation techniques to improve results of our system. In general, these embeddings are a fair competitor to the well-established i-vector pipeline which allows replacing the whole i-vector model by a deep neural network. Furthermore, the network is able to extract complementary information to the one contained in the i-vectors, even from the same input features. All this makes us consider that this contribution is an interesting research line to explore in other fields.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/iccect57938.2023.10140414
An Investigation into the Detection of Human Scratching Activity Based on Deep Learning Models
  • Apr 28, 2023
  • Kevin Wang

Because pruritus is often overlooked and undertreated in the clinical setting, a major unmet need is objective measures of behaviors associated with scratching in order to quantify itch severity and frequency since scratch directly correlates to itch. Such methods to measure itch and how itch severity changes over time are needed to objectively study and understand pruritus, develop and assess the efficacy of new medications, quantify disease severity in patients, and monitor treatment response. Wearable sensors in the form of wrist actigraphy, which detects wrist movements over time using micro-accelerometers, are the most studied and tested method to detect scratching events. To address these issues, 7 deep learning models will be used to train and test for scratch detection, including: Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) – Gated Recurrent Unit (GRU), RNN – Long Short-Term Memory (LSTM), CNN & RNN – GRU (end-to-end), CNN & RNN – LSTM (end-to-end), CNN & RNN – GRU (parallel) and CNN & RNN – LSTM (parallel). The final results show accurately detect scratching using deep learning (CNN achieved a high accuracy of 0.996) in various situations and can provide useful information (time, frequency, scratched body part, etc.) regarding the scratching behavior in day and nighttime in order to better quantify pruritus for use in the medical field.

  • Research Article
  • Cite Count Icon 34
  • 10.3390/machines12120927
Research on a Bearing Fault Diagnosis Method Based on a CNN-LSTM-GRU Model
  • Dec 17, 2024
  • Machines
  • Kaixu Han + 2 more

In view of the problem of the insufficient performance of deep learning models in time series prediction and poor comprehensive space–time feature extraction, this paper proposes a diagnostic method (CNN-LSTM-GRU) that integrates convolutional neural network (CNN), long short-term memory (LSTM) network, and gated recurrent unit (GRU) models. In this study, a convolutional neural network (CNN) model is used to process two-dimensional image data in both time and frequency domains, and a convolutional core attention mechanism is introduced to extract spatial features, such as peaks, cliffs, and waveforms, from the samples. A long short-term memory (LSTM) network is embedded in the output processing of the convolutional neural network (CNN) to analyze the long-sequence variation characteristics of rolling bearing vibration signals and enable long-term time series prediction by capturing long-term dependencies in the sequence. In addition, a gated recurrent unit (GRU) is used to refine long-term time series predictions, providing local fine-tuning and improving the accuracy of fault diagnosis. Using a dataset obtained from Case Western Reserve University (CWRU), the average accuracy of CNN-LSTM-GRU fault vibration is greater than 99%, and its superior performance in a noisy environment is demonstrated.

  • Research Article
  • Cite Count Icon 5
  • 10.11591/ijece.v14i3.pp3313-3319
Hybrid deep learning model for YouTube spam comment detection
  • Jun 1, 2024
  • International Journal of Electrical and Computer Engineering (IJECE)
  • Muhammad Sam'An + 1 more

Social media platforms, including YouTube and Facebook, allow users to interact through comments and videos. However, the openness of these platforms also makes them susceptible to spammers engaging in phishing, malware distribution, and advertisement dissemination. In response, our study introduces an innovative technique for detecting features indicative of spam within comments associated with shared videos. The initial phase involves data collection from the University of California, Irvine (UCI) machine learning repository and preprocessing using tokenization and lemmatization. Subsequently, a rigorous feature selection process is executed, and experiments are conducted with various proposed classification models. The performance evaluation demonstrates outstanding accuracy in identifying spam comments on YouTube: convolutional neural network with gated recurrent unit (CNN-GRU) at 95.92%, convolutional neural network with long short-term memory (CNN-LSTM) at 95.41%, convolutional neural network with bidirectional long short-term memory (CNN-biLSTM) at 96.43%, gated recurrent unit (GRU) at 95.41%, long short-term memory (LSTM) at 94.13%, and bidirectional long short-term memory (biLSTM) at 96.94% and convolutional neural network (CNN) at 94.64%. These results highlight the substantial contribution of our approach to spam detection and the fortification of online security.

  • Book Chapter
  • Cite Count Icon 6
  • 10.1007/978-3-030-55180-3_28
Comparison of Hybrid Recurrent Neural Networks for Univariate Time Series Forecasting
  • Aug 25, 2020
  • Anibal Flores + 2 more

The work presented in this paper aims to improve the accuracy of forecasting models in univariate time series, for this it is experimented with different hybrid models of two and four layers based on recurrent neural networks such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). It is experimented with two time series corresponding to downward thermal infrared and all sky insolation incident on a horizontal surface obtained from NASA’s repository. In the first time series, the results achieved by the two-layer hybrid models (LSTM + GRU and GRU + LSTM) outperformed the results achieved by the non-hybrid models (LSTM + LSTM and GRU + GRU); while only two of six four-layer hybrid models (GRU + LSTM + GRU + LSTM and LSTM + LSTM + GRU + GRU) outperformed non-hybrid models (LSTM + LSTM + LSTM + LSTM and GRU + GRU + GRU + GRU). In the second time series, only one model (LSTM + GRU) of two hybrid models outperformed the two non-hybrid models (LSTM + LSTM and GRU + GRU); while the four-layer hybrid models, none could exceed the results of the non-hybrid models.

  • Research Article
  • Cite Count Icon 6
  • 10.13031/aea.15867
Research on Greenhouse Environment Prediction Based on GCAKF-CNN-LSTM
  • Jan 1, 2024
  • Applied Engineering in Agriculture
  • Tianhong Liu + 3 more

Highlights A GCAKF-CNN-LSTM model is proposed for greenhouse temperature and humidity forecasting. The grey correlation analysis is used to select the most relevant variables. Kalman filter is applied for denoising to improve the data quality. The proposed model achieves higher forecasting accuracy with the lowest forecasting errors. Abstract. Accurate prediction of temperature and humidity in the greenhouse environment is helpful to regulate the environment and promote crop growth. Aiming at the characteristics of nonlinear and strong coupling in the greenhouse environment, this article proposes a hybrid greenhouse temperature and humidity prediction model based on preprocessing algorithms, Convolution Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. Firstly, grey correlation analysis (GCA) is used to screen the data features and analyze the factors affecting the temperature and humidity in the greenhouse. Secondly, data is denoised by the Kalman filter (KF) to reduce the noise interference. Thirdly, the local connection and weight sharing features of the CNN are applied to obtain effective features from the series, and the long- and short-term dependence relationships of the data are learned by using the LSTM networks. Finally, the proposed model is validated on the greenhouse data. Experimental results demonstrated that, compared with Back Propagation(BP), Gated Recurrent Units (GRU), and LSTM, the RMSE of temperature prediction results was reduced by 31.5%, 21.6%, 14.4%, and the MAE reduced by 48.5%, 41.0%, and 32.3%, respectively. The RMSE of humidity prediction results decreased by 28.3%, 2.73%, and 0.63%, and the MAE decreased by 69.4%, 54.5%, and 10.8%, respectively. The proposed model can improve prediction accuracy and provide a decision basis for improving the timeliness of the greenhouse environmental control system. Keywords: Convolutional neural network, Greenhouse environment prediction, Kalman filter, Long short-term memory network.

  • Conference Article
  • Cite Count Icon 20
  • 10.1109/icb.2016.7550050
Tattoo detection based on CNN and remarks on the NIST database
  • Jun 1, 2016
  • Qingyong Xu + 4 more

Detecting tattoo images stored in information technology (IT) devices of suspects is an important but challenging task for law enforcement agencies. Recently, the U.S. National Institute of Standards and Technology (NIST) held a challenge and released a tattoo database for the commercial and academic community in advancing research and development into automated image-based tattoo recognition technology. The best tattoo detection result in the NIST challenge was achieved by MorphoTrak with accuracy of 96.3%. This paper aims to answer three questions. 1) Is the NIST database suitable for training algorithms to detect tattoo images stored in IT devices of suspects? 2) Can convolutional neural networks (CNNs) outperform the MorphoTrak's algorithm? 3) How do training databases impact on tattoo detection performance? The NIST tattoo detection database containing 2,349 images and a database containing 10,000 collected from Flickr are utilized to answer these questions. The Flickr images taken in diverse environments and poses are used to simulate images stored in the IT devices. A CNN is trained on the NIST and Flickr images for this study. The experimental results demonstrate that the CNN outperforms the MorphoTrak's algorithm by 2.5%, achieving accuracy of 98.8% on the NIST database. When the CNN is trained on the NIST database to detect Flickr images, the accuracy drops to 65.8%. It implies that the NIST database is not an ideal database for training algorithms to detect tattoo images in IT devices of suspects. However, when the training database size increases, the detection performance improves.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/eict54103.2021.9733457
Malware Detection Using Neural Networks
  • Dec 17, 2021
  • Humaira Hossain + 5 more

The Internet has a large amount of data and files that need to be analyzed for possible malicious purposes since the number of malicious applications is growing at a rapid rate. Researchers have tried to detect malware using neural networks and deep learning methods which we have discussed in the related works section. However, in this paper we are analyzing and contrasting performance of three different neural network models which are: Convolutional Neural Network (CNN), Long- Short Term Memory (LSTM) Network, and Gated Recurrent Unit (GRU) for malware detection. Besides, we used secondary dataset in our research. From the aforementioned models, CNN is performing better giving 83 percent accuracy in recognizing malware whereas LSTM and GRU gives 65 percent and 76 percent respectively.

  • Research Article
  • 10.55041/ijsrem35669
Sentiment Analysis of Textual Data using Deep Learning
  • Jun 15, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Artificial Intelligence & Machinelearning Department Of Computer Science And Engineering Malla Reddy University, Hyderabad,Telangana,India

Textual data is generated in large volume everyday over internet which makes sentiment analysis important . Sentiment analysis of any textual data denotes the feelings and attitudes of the individual on particular topics or products. It extracts the Sentiment polarity (negative ,neutral or positive ) from textual data using deep learning algorithms .Sentiment analysis of textual data involves using neural network architecture to analyze and determine the sentiment expressed in text .It is a subfield of text classification which involves analyzing people's opinions , emotions , and attitudes towards entities and their characteristics as expressed in a written text . It utilizes three deep learning algorithms which are Neural Networks , Long Short Term Memory (LSTM) , and Gated Recurrent Unit (GRU). The results of RNN, LSTM, and GRU obtain an excellent rate of accuracy . It can be concluded from the outcomes that the used preprocessing stages made a positive impact on the accuracy rate Keywords: These are the neural network architectures employed for sentiment analysis such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), Transformer models

  • Research Article
  • Cite Count Icon 6
  • 10.1080/15435075.2024.2448301
Prediction of electric vehicle battery state of health estimation using a hybrid deep learning mechanism
  • Jan 4, 2025
  • International Journal of Green Energy
  • Akshat Kant + 2 more

Lithium-ion batteries (LIBs) are widely employed, but fluctuations in temperature, overcharging, and overdischarging reduce their service lifetime. Battery health issues such as accelerated deterioration, loss of capacity, and thermal runaway can also endanger battery safety and functionality. This paper presents the integration of a Bidirectional Recurrent Neural Network and Long Short-Term Memory (biRNN-LSTM) network improve the prediction capability of Li-ion battery State of Health (SoH) with complex patterns identification and higher prediction accuracy. Compared to traditional feed-forward neural networks, RNNs are designed to learn temporal dependencies and perform sequence recognition on the original data. After this, LSTM modules improve this by being an example of the long-term time series information, which helps solve problems such as vanishing gradients. To highlight the effectiveness of the proposed method and compare it with the Deep Convolutional Neural Network and Long Short-Term Memory (DCNN-LSTM), Gate Recurrent Unit (GRU), and Long Short-Term Memory (LSTM) from the literature to make accurate and reliable predictions, the Root Mean Square Error (RMSE), Maximum Accuracy Error (MAE), and Maximum Error (MAX) assessment metrics were used for performance evaluation. GRU needs 8000 iterations to identify SoH estimation errors because it is less capable of learning long-term dependencies. The proposed technique can detect errors after 7000 iterations since it performs exceptionally well in capturing fine-grained temporal dynamics.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 16
  • 10.3389/fenrg.2022.945769
Computationally Inexpensive 1D-CNN for the Prediction of Noisy Data of NOx Emissions From 500 MW Coal-Fired Power Plant
  • Aug 16, 2022
  • Frontiers in Energy Research
  • Muhammad Waqas Saif-Ul-Allah + 11 more

Coal-fired power plants have been used to meet the energy requirements in countries where coal reserves are abundant and are the key source of NOx emissions. Owing to the serious environmental and health concerns associated with NOx emissions, much work has been carried out to reduce NOx emissions. Sophisticated artificial intelligence (AI) techniques have been employed during the past few decades, such as least-squares support vector machine (LSSVM), artificial neural networks (ANN), long short-term memory (LSTM), and gated recurrent unit (GRU), to develop the NOx prediction model. Several studies have investigated deep neural networks (DNN) models for accurate NOx emission prediction. However, there is a need to investigate a DNN-based NOx prediction model that is accurate and computationally inexpensive. Recently, a new AI technique, convolutional neural network (CNN), has been introduced and proven superior for image class prediction accuracy. According to the best of the author’s knowledge, not much work has been done on the utilization of CNN on NOx emissions from coal-fired power plants. Therefore, this study investigated the prediction performance and computational time of one-dimensional CNN (1D-CNN) on NOx emissions data from a 500 MW coal-fired power plant. The variations of hyperparameters of LSTM, GRU, and 1D-CNN were investigated, and the performance metrics such as RMSE and computational time were recorded to obtain optimal hyperparameters. The obtained optimal values of hyperparameters of LSTM, GRU, and 1D-CNN were then employed for models’ development, and consequently, the models were tested on test data. The 1D-CNN NOx emission model improved the training efficiency in terms of RMSE by 70.6% and 60.1% compared to LSTM and GRU, respectively. Furthermore, the testing efficiency for 1D-CNN improved by 10.2% and 15.7% compared to LSTM and GRU, respectively. Moreover, 1D-CNN (26 s) reduced the training time by 83.8% and 50% compared to LSTM (160 s) and GRU (52 s), respectively. Results reveal that 1D-CNN is more accurate, more stable, and computationally inexpensive compared to LSTM and GRU on NOx emission data from the 500 MW power plant.

  • Research Article
  • 10.12962/j24068535.v19i2.a1080
IMPROVED LIP-READING LANGUAGE USING GATED RECURRENT UNITS
  • Jul 31, 2021
  • JUTI: Jurnal Ilmiah Teknologi Informasi
  • Nafa Zulfa + 2 more

Lip-reading is one of the most challenging studies in computer vision. This is because lip-reading requires a large amount of training data, high computation time and power, and word length variation. Currently, the previous methods, such as Mel Frequency Cepstrum Coefficients (MFCC) with Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) with LSTM, still obtain low accuracy or long-time consumption because they use LSTM. In this study, we solve this problem using a novel approach with high accuracy and low time consumption. In particular, we propose to develop lip language reading by utilizing face detection, lip detection, filtering the amount of data to avoid overfitting due to data imbalance, image extraction based on CNN, voice extraction based on MFCC, and training model using LSTM and Gated Recurrent Units (GRU). Experiments on the Lip Reading Sentences dataset show that our proposed framework obtained higher accuracy when the input array dimension is deep and lower time consumption compared to the state-of-the-art.

  • Conference Article
  • Cite Count Icon 12
  • 10.1109/comnetsat56033.2022.9994409
GRU-MF: A Novel Appliance Classification Method for Non-Intrusive Load Monitoring Data
  • Nov 3, 2022
  • Aji Gautama Putrada + 3 more

Appliance classification using non-intrusive load monitoring (NILM) data is a growing research interest. Various studies in the field have used methods such as long short-term memory (LSTM), recurrent neural network (RNN), convolutional neural network (CNN), and deep neural network (DNN). However, there is a research opportunity to apply a gated recurrent unit (GRU), which is good for low-frequency data, with filtering mode (MF) for smoothing prediction results. This study proposes a novel GRU - MF method for classifying electricity appliances using power data from NILM. The first step in this research is to get NILM data. We use power data from the dishwasher, heater, refrigerator, and lighting. Then the first stage of data pre-processing consists of auto-correlation and time series-data transformation processes. The second stage of pre-processing data consists of normalization, standardization, label encoding, and one hot encoding process. The next stage is GRU training, where we compare the GRU with four benchmark methods: LSTM, CNN, DNN, and RNN. We tested the performance of our proposed model with Accuracy, Precision, and Recall. Finally, we implement MF to improve the performance of our appliance classification model. The test results show that our novel method is better than the LSTM, RNN, CNN, and DNN models. The GRU model itself has an Accuracy equal to 0.96 on test data. Once combined into GRU-MF, we achieve the Accuracy of 0.98 in real data.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.1038/s41598-025-05103-z
Enhancing agricultural commodity price forecasting with deep learning
  • Jul 1, 2025
  • Scientific Reports
  • R L Manogna + 2 more

Accurate forecasting of agricultural commodity prices is essential for market planning and policy formulation, especially in agriculture-dependent economies like India. Price volatility, driven by factors such as weather variability and market demand fluctuations, poses significant forecasting challenges. This study evaluates the performance of traditional stochastic models, machine learning techniques, and deep learning approaches in forecasting the prices of 23 commodities using daily wholesale price data from January 2010 to June 2024. Models assessed include Autoregressive Integrated Moving Average, Support Vector Regression, Extreme Gradient Boosting, Multilayer Perceptron, Recurrent Neural Networks, Long Short-Term Memory Networks, Gated Recurrent Units, and Echo State Networks. Results show that deep learning models, particularly Long Short-Term Memory and Gated Recurrent Units, outperform others in capturing complex temporal patterns, achieving superior accuracy across error metrics. The results indicate that deep learning models, particularly Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU), demonstrate superior performance in capturing complex temporal patterns. For instance, the GRU model achieved a Root Mean Squared Error (RMSE) of 369.54 for onions and 210.35 for tomatoes, significantly outperforming the ARIMA model, which recorded RMSE values of 1564.62 and 1298.60, respectively. Furthermore, the Mean Absolute Percentage Error (MAPE) for GRU was notably lower, at 14.59% for onions and 10.58% for tomatoes. These results underscore the efficacy of deep learning approaches in addressing the inherent volatility and nonlinear dynamics of agricultural commodity prices. These findings offer valuable insights for policymakers, traders, and farmers, enabling better market interventions, crop planning, and risk management. The study recommends exploring hybrid models and incorporating external factors like weather data to further enhance forecasting reliability.

  • Research Article
  • Cite Count Icon 34
  • 10.11591/eei.v13i1.6059
An Adam based CNN and LSTM approach for sign language recognition in real time for deaf people
  • Feb 1, 2024
  • Bulletin of Electrical Engineering and Informatics
  • Subrata Kumer Paul + 7 more

Hand gestures and sign language are crucial modes of communication for deaf individuals. Since most people can't understand sign language, it's hard for a mute and an average person to talk to each other. Because of technological progress, computer vision and deep learning can now be used to count. This paper shows two ways to use deep knowledge to recognize sign language. These methods help regular people understand sign language and improve their communication. Based on American sign language (ASL), two separate datasets have been constructed; the first has 26 signs, and the other contains three significant symbols with the crucial sequence of frames or videos for regular communication. This study looks at three different models: the improved ResNet-based convolutional neural network (CNN), the long short-term memory (LSTM), and the gated recurrent unit (GRU). The first dataset is used to fit and assess the CNN model. With the adaptive moment estimation (Adam) optimizer, CNN obtains an accuracy of 89.07%. In contrast, the second dataset is given to LSTM and GRU and a comparison has been conducted. LSTM does better than GRU in all classes. LSTM has a 94.3% accuracy, while GRU only manages 79.3%. Our preliminary models' real-time performance is also highlighted.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant