Word2vec Model Research Articles

Spam reviews pose a significant challenge to the integrity of online platforms, misleading consumers and undermining the credibility of genuine feedback. This paper introduces an innovative AI-generated spam review detection framework that leverages Deep Learning algorithms and Natural Language Processing (NLP) techniques to identify and mitigate spam reviews effectively. Our framework utilizes multiple Deep Learning models, including Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), to capture intricate patterns in textual data. The system processes and analyzes large volumes of review content to detect deceptive patterns by utilizing advanced NLP and text embedding techniques such as One-Hot Encoding, Word2Vec, and Term Frequency-Inverse Document Frequency (TF-IDF). By combining three embedding techniques with four Deep Learning algorithms, a total of twelve exhaustive experiments were conducted to detect AI-generated spam reviews. The experimental results demonstrate that our approach outperforms the traditional machine learning models, offering a robust solution for ensuring the authenticity of online reviews. Among the models evaluated, those employing Word2Vec embeddings, particularly the BiLSTM_Word2Vec model, exhibited the strongest performance. The BiLSTM model with Word2Vec achieved the highest performance, with an exceptional accuracy of 98.46%, a precision of 0.98, a recall of 0.97, and an F1-score of 0.98, reflecting a near-perfect balance between precision and recall. Its high F2-score (0.9810) and F0.5-score (0.9857) further highlight its effectiveness in accurately detecting AI-generated spam while minimizing false positives, making it the most reliable option for this task. Similarly, the Word2Vec-based LSTM model also performed exceptionally well, with an accuracy of 97.58%, a precision of 0.97, a recall of 0.96, and an F1-score of 0.97. The CNN model with Word2Vec similarly delivered strong results, achieving an accuracy of 97.61%, a precision of 0.97, a recall of 0.96, and an F1-score of 0.97. This study is unique in its focus on detecting spam reviews specifically generated by AI-based tools rather than solely detecting spam reviews or AI-generated text. This research contributes to the field of spam detection by offering a scalable, efficient, and accurate framework that can be integrated into various online platforms, enhancing user trust and the decision-making processes.

Read full abstract

Patient portal messages often relate to specific clinical phenomena (e.g., patients undergoing treatment for breast cancer) and, as a result, have received increasing attention in biomedical research. These messages require natural language processing and, while word embedding models, such as word2vec, have the potential to extract meaningful signals from text, they are not readily applicable to patient portal messages. This is because embedding models typically require millions of training samples to sufficiently represent semantics, while the volume of patient portal messages associated with a particular clinical phenomenon is often relatively small. We introduce a novel adaptation of the word2vec model, PK-word2vec (where PK stands for prior knowledge), for small-scale messages. PK-word2vec incorporates the most similar terms for medical words (including problems, treatments, and tests) and non-medical words from two pre-trained embedding models as prior knowledge to improve the training process. We applied PK-word2vec in a case study of patient portal messages in the Vanderbilt University Medical Center electric health record system sent by patients diagnosed with breast cancer from December 2004 to November 2017. We evaluated the model through a set of 1000 tasks, each of which compared the relevance of a given word to a group of the five most similar words generated by PK-word2vec and a group of the five most similar words generated by the standard word2vec model. We recruited 200 Amazon Mechanical Turk (AMT) workers and 7 medical students to perform the tasks. The dataset was composed of 1389 patient records and included 137,554 messages with 10,683 unique words. Prior knowledge was available for 7981 non-medical and 1116 medical words. In over 90% of the tasks, both reviewers indicated PK-word2vec generated more similar words than standard word2vec (p = 0.01).The difference in the evaluation by AMT workers versus medical students was negligible for all comparisons of tasks’ choices between the two groups of reviewers (p=0.774\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$${\ ext{p}} = 0.774$$\\end{document} under a paired t-test). PK-word2vec can effectively learn word representations from a small message corpus, marking a significant advancement in processing patient portal messages.

Read full abstract

Word2vec Model Research Articles

Related Topics

Articles published on Word2vec Model

Ancient Text Translation Model Optimized with GujiBERT and Entropy-SkipBERT

Sustainable sentiment analysis on E-commerce platforms using a weighted parallel hybrid deep learning approach for smart cities applications

The construction of a Chinese fine-grained sentiment dictionary for Chinese domestic investors (CN-FSD) and its application

AI-Generated Spam Review Detection Framework with Deep Learning Algorithms and Natural Language Processing

Preservation of emotional context in tweet embeddings on social networking sites

Unveiling Similarities in the Code of Life: A Detailed Exploration of DNA Sequence Matching Algorithm

Exploring Deep Learning Models for Lyric Generation and Addressing Biases in Word Embeddings

Green public procurement and corporate environmental performance: An empirical analysis based on data from green procurement contracts

ANALISIS SENTIMEN KEPUASAN PELAYANAN TRANSPORTASI ONLINE GOJEK MENGGUNAKAN ALGORITMA EXTREME LEARNING MACHINE

Exploiting web content semantic features to detect web robots from weblogs

Optimizing word embeddings for small datasets: a case study on patient portal messages from breast cancer patients

Advancing equity in breast cancer care: natural language processing for analysing treatment outcomes in under-represented populations

Semantic Similarity Measure Using a Combination of Word2Vec and WordNet Models

Алгоритмы поиска вербальных маркеров идентичности в современном научном дискурсе

Research on the evolution of cross-platform online public opinion for public health emergencies considering stakeholders.

Corporate financial distress prediction using the risk-related information content of annual reports

Values That Are Explicitly Present in Fairy Tales: Comparing Samples from German, Italian and Portuguese Traditions

Optimizing Word Embeddings for Patient Portal Message Datasets with a Small Number of Samples.

On Embedding Implementations in Text Ranking and Classification Employing Graphs

Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Word2vec Model Research Articles

Related Topics

Articles published on Word2vec Model

Ancient Text Translation Model Optimized with GujiBERT and Entropy-SkipBERT

Sustainable sentiment analysis on E-commerce platforms using a weighted parallel hybrid deep learning approach for smart cities applications

The construction of a Chinese fine-grained sentiment dictionary for Chinese domestic investors (CN-FSD) and its application

AI-Generated Spam Review Detection Framework with Deep Learning Algorithms and Natural Language Processing

Preservation of emotional context in tweet embeddings on social networking sites

Unveiling Similarities in the Code of Life: A Detailed Exploration of DNA Sequence Matching Algorithm

Exploring Deep Learning Models for Lyric Generation and Addressing Biases in Word Embeddings

Green public procurement and corporate environmental performance: An empirical analysis based on data from green procurement contracts

ANALISIS SENTIMEN KEPUASAN PELAYANAN TRANSPORTASI ONLINE GOJEK MENGGUNAKAN ALGORITMA EXTREME LEARNING MACHINE

Exploiting web content semantic features to detect web robots from weblogs

Optimizing word embeddings for small datasets: a case study on patient portal messages from breast cancer patients

Advancing equity in breast cancer care: natural language processing for analysing treatment outcomes in under-represented populations

Semantic Similarity Measure Using a Combination of Word2Vec and WordNet Models

Алгоритмы поиска вербальных маркеров идентичности в современном научном дискурсе

Research on the evolution of cross-platform online public opinion for public health emergencies considering stakeholders.

Corporate financial distress prediction using the risk-related information content of annual reports

Values That Are Explicitly Present in Fairy Tales: Comparing Samples from German, Italian and Portuguese Traditions

Optimizing Word Embeddings for Patient Portal Message Datasets with a Small Number of Samples.

On Embedding Implementations in Text Ranking and Classification Employing Graphs

Mapping of Alzheimer’s disease related data elements and the NIH Common Data Elements