Word Embedding Technique Research Articles

Over the past decade, indoor localization systems have gained increasing attention and found widespread applications in commercial and research environments. Specifically, a Wi-Fi fingerprint-based system offers a low-cost solution over its counterparts such as Bluetooth, ultra-wideband (UWB), and radio frequency identification (RFID) technologies due to the ubiquity of Wi-Fi access points (WAPs) in most buildings. However, the main disadvantage of the fingerprint-based system is intensive survey effort required during system initialization and maintenance. This work explores a solution to alleviate this limitation by considering a crowdsourcing approach for zone-level localization. Instead of relying only on the labelled fingerprint data from trained surveyors, this approach uses the more-attainable unlabelled fingerprint data collected by participating volunteers. This unlabelled data is then used to augment the survey data in a process called pseudo labelling, forming a more comprehensive training dataset for subsequent localization tasks; this semi-supervised approach allows for minimal survey effort during system initialization and maintenance. To enable such solution, this work introduces a novel approach of employing non-contextual word embedding techniques to construct distributed vector representations of fingerprint data to overcome 3 challenges; (a) high memory requirement in the downstream tasks due to high-dimensional non-distributed vector representations from the “standard” vector transformation, (b) inclusion of an arbitrary value that represents missing WAPs which can affect the performance of the downstream localization tasks in a non-transparent manner, and most importantly, (c) poor pseudo-labelling and semi-supervised zone-prediction performances due to poor data separability in a feature space. The choice of the non-contextual text-embedding techniques, as opposed to the contextual counterparts, leads to less computational requirement in model training and distributed-representation generation due to simpler model architectures (no deep learning) and no requirement for pre-trained model during distributed-representation generation. To this end, we considered non-contextual word embedding techniques commonly used in natural language processing such as Word2Vec, GloVe, and Doc2Vec in the distributed-representation transformation, and compared the resulting downstream performances with those from well-recognized dimensionality reduction techniques such as PCA, Isomap, and UMAP. The results show that Word2Vec and GloVe transformations outperform other types of transformations in terms of separability in fingerprint representations, pseudo-labelling performance, and semi-supervised zone-prediction accuracy. Together with the promising robustness property against potential data inhomogeneity, Word2Vec and GloVe transformations are the recommended transformation processes for constructing vector representations of fingerprints in crowdsourcing zone-level localization. HIGHLIGHTS This work introduces a novel approach of employing non-contexual word-embedding techniques to construct distributed vector representations of Wi-Fi fingerprint data to facilitate pseudo-labelling and semi-supervised zone-prediction tasks in crowdsourcing zone-level localization The benefits of employing word-embedding techniques are (a) lower memory requirement in the downstream tasks due to distributed vector representations (b) no inclusion of an arbitrary value that represents missing WAPs which can affect the performance of the downstream localization tasks in a non-transparent manner (c) improved pseudo-labelling and semi-supervised zone-prediction performances due to improved data separability in a feature space The benefit of employing non-contextual techniques, as opposed to the contextual counterparts, is less computational requirement in model training and distributed-representation generation due to simpler model architectures (no deep learning) and no requirement for pre-trained model during distributed-representation generation The results show that Word2Vec and GloVe transformations outperform other types of transformations in terms of separability in fingerprint representations, pseudo-labelling performance, and semi-supervised zone-prediction accuracy GRAPHICAL ABSTRACT

Read full abstract

Asian and low-resource language information processing refers to the field of computational linguistics that aims to develop natural language processing (NLP) technologies for languages that have fewer available language resources or are less commonly spoken. This is an important field of study because many languages in Asia and other parts of the world are underrepresented in the field of NLP, which may limit access to information and technology for speakers of these languages. The growing volume of user-generated content on the web has made sentiment analysis (SA) a significant tool for extracting data regarding human emotional states. Twitter sentiment detectors provide a superior solution for assessing the quality of products and services compared to other conventional technologies. The detection performance and classifier accuracy of SA, which can be highly dependent on classifier methods and the quality of input features have been utilised. Deep learning (DL) methods use distinct techniques to extract data from raw data such as tweets or texts and represent them in different forms of models. Therefore, this article presents a Dwarf Mongoose Optimization with Deep Learning-Based Twitter Sentiment Classification (DMODL-TSC) technique to classify sentiments based on tweets. The presented DMODL-TSC technique leverages the concepts of natural language processing (NLP) and DL. Primarily, the raw tweets are preprocessed to transform them into a useful format. Next, the DMODL-TSC technique uses the advanced FastText word embedding technique. Moreover, the bidirectional recurrent neural network (BiRNN) method is utilized for the recognition of sentiments. Finally, the DMO technique is utilized for the optimal hyperparameter optimization of the BiRNN method, which leads to effective classification performance. The comprehensive result examination of the DMODL-TSC system was tested on three datasets, and the obtained outcomes illustrate the supremacy of the DMODL-TSC approach.

Read full abstract

Word Embedding Technique Research Articles

Related Topics

Articles published on Word Embedding Technique

Identifying Phage Sequences From Metagenomic Data Using Deep Neural Network With Word Embedding and Attention Mechanism.

Arabic Toxic Tweet Classification: Leveraging the AraBERT Model

A detailed review on word embedding techniques with emphasis on word2vec

Automatic Construction of Educational Knowledge Graphs: A Word Embedding-Based Approach

Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance

Statement Recognition of Access Control Policies in IoT Networks.

Distributed Representations of Wi-Fi Fingerprints from Non-Contextual Text-Embedding Techniques with Applications in Crowdsourcing Zone-Level Localization

Low-Resource Language Information Processing using Dwarf Mongoose Optimization with Deep Learning Based Sentiment Classification

Describe the house and I will tell you the price: House price prediction with textual description data

CoBiCo: A model using multi-stage ConvNet with attention-based Bi-LSTM for efficient sentiment classification

Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning

Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

A comparison of word embedding-based extraction feature techniques and deep learning models of natural disaster messages classification

A comparative study on word embedding techniques for suicide prediction on COVID-19 tweets using deep learning models

A Low-Complexity Deep Learning Model for Predicting Targeted Sequencing Depth from Probe Sequence

Comparison of Data Models for Unsupervised Twitter Sentiment Analysis

Fake news detection based on a hybrid BERT and LightGBM models

DiMo: discovery of microRNA motifs using deep learning and motif embedding.

An analysis of the thematic evolution of ageing and healthcare expenditure using word embedding: A scoping review of policy implications

Framing climate change in Nature and Science editorials: applications of supervised and unsupervised text categorization

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Word Embedding Technique Research Articles

Related Topics

Articles published on Word Embedding Technique

Identifying Phage Sequences From Metagenomic Data Using Deep Neural Network With Word Embedding and Attention Mechanism.

Arabic Toxic Tweet Classification: Leveraging the AraBERT Model

A detailed review on word embedding techniques with emphasis on word2vec

Automatic Construction of Educational Knowledge Graphs: A Word Embedding-Based Approach

Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance

Statement Recognition of Access Control Policies in IoT Networks.

Distributed Representations of Wi-Fi Fingerprints from Non-Contextual Text-Embedding Techniques with Applications in Crowdsourcing Zone-Level Localization

Low-Resource Language Information Processing using Dwarf Mongoose Optimization with Deep Learning Based Sentiment Classification

Describe the house and I will tell you the price: House price prediction with textual description data

CoBiCo: A model using multi-stage ConvNet with attention-based Bi-LSTM for efficient sentiment classification

Robust Malware identification via deep temporal convolutional network with symmetric cross entropy learning

Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

A comparison of word embedding-based extraction feature techniques and deep learning models of natural disaster messages classification

A comparative study on word embedding techniques for suicide prediction on COVID-19 tweets using deep learning models

A Low-Complexity Deep Learning Model for Predicting Targeted Sequencing Depth from Probe Sequence

Comparison of Data Models for Unsupervised Twitter Sentiment Analysis

Fake news detection based on a hybrid BERT and LightGBM models

DiMo: discovery of microRNA motifs using deep learning and motif embedding.

An analysis of the thematic evolution of ageing and healthcare expenditure using word embedding: A scoping review of policy implications

Framing climate change in Nature and Science editorials: applications of supervised and unsupervised text categorization