Vector Space Model Research Articles

This paper presented a new focused crawler that efficiently supports the Turkish language. The developed architecture was divided into multiple units: a control unit, crawler unit, link extractor unit, link sorter unit, and natural language processing unit. The crawler's units can work in parallel to process the massive amount of published websites. Also, the proposed Convolutional Neural Network (CNN) based natural language processing unit can professionally classifying Turkish text and web pages. Extensive experiments using three datasets have been performed to illustrate the performance of the developed approach. The first dataset contains 50,000 Turkish web pages downloaded by the developed crawler, while the other two are publicly available and consist of “28,567” and “22,431” Turkish web pages, respectively. In addition, the Vector Space Model (VSM) in general and word embedding state-of-the-art techniques, in particular, were investigated to find the most suitable one for the Turkish language. Overall, results indicated that the developed approach had achieved good performance, robustness, and stability when processing the Turkish language. Also, Bidirectional Encoder Representations from Transformer (BERT) was found to be the most appropriate embedding for building an efficient Turkish language classification system. Finally, our experiments showed superior performance of the developed natural language processing unit against seven state-of-the-art CNN classification systems. Where accuracy improvement compared to the second-best is 10% and 47% compared to the lowest performance.

Read full abstract

Text classification is one of the classic tasks in the field of natural language processing. The goal is to identify the category to which the text belongs. Text categorization is widely used in email detection, sentiment analysis, topic marking and other fields. However, good text representation is the key to improve the performance of natural language processing tasks such as text classification. Traditional text representation adopts bag-of-words model or vector space model, which not only loses the context information of the text, but also faces the problems of high latitude and high sparsity. In recent years, with the increase of data and the improvement of computing performance, the use of deep learning technology to represent and classify texts has attracted great attention. Convolutional neural network, recurrent neural network and recurrent neural network with attention mechanism are used to represent the text, and then to classify the text and other natural language processing tasks, all of which have better performance than the traditional methods. In this paper, we design two sentence-level text representation and classification models based on the deep network. The details are as follows: (1) Text representation and classification model based on bidirectional cyclic and convolutional neural networks-BRCNN. Brcnn's input is the word vector corresponding to each word in the sentence; After using cyclic neural network to extract word order information in sentences, convolution neural network is used to extract higher-level features of sentences. After convolution, the maximum pool operation is used to obtain sentence vectors. At last, softmax classifier is used for classification. Cyclic neural network can capture the word order information in sentences, while convolutional neural network can extract useful features. Experiments on eight text classification tasks show that BRCNN model can get better text feature representation, and the classification accuracy rate is equal to or higher than that of the prior art.. (2) A text representation and classification model based on attention mechanism and convolutional neural network-ACNN. ACNN model uses the recurrent neural network with attention mechanism to obtain the context vector; Then convolution neural network is used to extract more advanced feature information. The maximum pool operation is adopted to obtain a sentence vector; At last, the softmax classifier is used to classify the text. Experiments on eight text classification benchmark data sets show that ACNN improves the stability of model convergence, and can converge to an optimal or local optimal solution better than BRCNN.

Read full abstract

Vector Space Model Research Articles

Related Topics

Articles published on Vector Space Model

A hunger-based scheduling strategy for distributed crawler

Fuzzy RDF Knowledge Graph Embeddings Through Vector Space Model

Multimedia information retrieval using artificial neural network

Study and innovation of effective classification of XML documents using an advanced deep learning approach

Constructing a Shariah Document Screening Prototype Based on Serverless Architecture

Automated clustering and switching algorithms applied to semantic verbal fluency data in schizophrenia spectrum disorders

An Ensemble and Multi-View Clustering Method Based on Kolmogorov Complexity

An Emotional Analysis Method Based on Multi Model Ensemble Learning

Automatic quantitative assessment of English writing proficiency based on multi-feature fusion

Improved Electro Search Algorithm with Intelligent Controller Control System: ESPID Algorithm

A Topic-Specific Web Crawler using Deep Convolutional Networks

A Novel Patent Knowledge Extraction Method for Innovative Design

Retracted: Clustering Merchants and Accurate Marketing of Products Using the Segmentation Tree Vector Space Model

Difference Analysis in Risk Focus Based on Text Disclosure Frequency: A Triple Perspectives of Bank Managers, Financial Analysts, and Credit Raters

Word Embeddings and Semantic Spaces in Natural Language Processing

Research On Text Classification Based On Deep Neural Network

Topic Modelling Using VSM-LDA For Document Summarization

Semantic-Based K-Means Clustering for IMDB Top 100 Movies

Mengiplementasikan Vector Space Model Similarity Euclidean Distance Menggunakan TFIDF Pada Klasifikasi Teks Bahasa Indonesia

Identification and Filtering of Web Spams Using a Machine Learning Method

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Vector Space Model Research Articles

Related Topics

Articles published on Vector Space Model

A hunger-based scheduling strategy for distributed crawler

Fuzzy RDF Knowledge Graph Embeddings Through Vector Space Model

Multimedia information retrieval using artificial neural network

Study and innovation of effective classification of XML documents using an advanced deep learning approach

Constructing a Shariah Document Screening Prototype Based on Serverless Architecture

Automated clustering and switching algorithms applied to semantic verbal fluency data in schizophrenia spectrum disorders

An Ensemble and Multi-View Clustering Method Based on Kolmogorov Complexity

An Emotional Analysis Method Based on Multi Model Ensemble Learning

Automatic quantitative assessment of English writing proficiency based on multi-feature fusion

Improved Electro Search Algorithm with Intelligent Controller Control System: ESPID Algorithm

A Topic-Specific Web Crawler using Deep Convolutional Networks

A Novel Patent Knowledge Extraction Method for Innovative Design

Retracted: Clustering Merchants and Accurate Marketing of Products Using the Segmentation Tree Vector Space Model

Difference Analysis in Risk Focus Based on Text Disclosure Frequency: A Triple Perspectives of Bank Managers, Financial Analysts, and Credit Raters

Word Embeddings and Semantic Spaces in Natural Language Processing

Research On Text Classification Based On Deep Neural Network

Topic Modelling Using VSM-LDA For Document Summarization

Semantic-Based K-Means Clustering for IMDB Top 100 Movies

Mengiplementasikan Vector Space Model Similarity Euclidean Distance Menggunakan TFIDF Pada Klasifikasi Teks Bahasa Indonesia

Identification and Filtering of Web Spams Using a Machine Learning Method