Vector Space Model Research Articles

• We propose a Capsule Semantic Graph (CSG) to represent the news documents. • The CSG can effectively capture the relationship between words and semantic of news documents. • We introduce the graph kernel to measure the similarity between CSGs. • Our method can better solve the problems of news representation and similarity measurement. • Our method has great significance for topic detection from news. Topic detection aims to discover valuable topics from the massive online news. It can help people to capture what is happening in real world and alleviate the burden of information overload. It also has great significance since the online news is experiencing an explosive growth. Topic detection is typically transformed into a document clustering problem, whose core idea is to cluster news documents that report on the same topic to the same group based on document similarity. Due to the complex structure and long length of news documents, the similarity measurement of news is very challenging. Existing term-based methods represent news documents based on a set of informative keywords in the document with a vector space model (VSM) and then the relationship between documents is calculated by cosine similarity . However, VSM ignores the relationship between words and has sparse semantics, which leads to low precision of topic detection. In recent years, the probabilistic methods and the graph analytical methods have been proposed for topic detection. However, both of them have high time complexity. To cope with these problems, we first present a novel document representation approach based on graphical decomposition, which decomposes each news document into different semantic units and then relationship between the semantic units is constructed to form a capsule semantic graph (CSG). The CSG can retain the relationship between words and alleviate the sparse semantics compared to VSM representation. We next introduce the graph kernel to measure the similarity between the CSGs based on their substructures. Finally, we use an incremental clustering method to cluster the news documents, in which the documents are represented by CSGs and the similarity between documents is calculated by graph kernel. The experiment results on three standard datasets show that our method obtains higher precision, recall and F1 score than several state-of-the-art methods. Moreover, the experiment results on a large news dataset show that our CSG-SM has lower time complexity than probabilistic methods and graph analytical methods.

Read full abstract

PurposeNatural languages have a fundamental quality of suppleness that makes it possible to present a single idea in plenty of different ways. This feature is often exploited in the academic world, leading to the theft of work referred to as plagiarism. Many approaches have been put forward to detect such cases based on various text features and grammatical structures of languages. However, there is a huge scope of improvement for detecting intelligent plagiarism.Design/methodology/approachTo realize this, the paper introduces a hybrid model to detect intelligent plagiarism by breaking the entire process into three stages: (1) clustering, (2) vector formulation in each cluster based on semantic roles, normalization and similarity index calculation and (3) Summary generation using encoder-decoder. An effective weighing scheme has been introduced to select terms used to build vectors based on K-means, which is calculated on the synonym set for the said term. If the value calculated in the last stage lies above a predefined threshold, only then the next semantic argument is analyzed. When the similarity score for two documents is beyond the threshold, a short summary for plagiarized documents is created.FindingsExperimental results show that this method is able to detect connotation and concealment used in idea plagiarism besides detecting literal plagiarism.Originality/valueThe proposed model can help academics stay updated by providing summaries of relevant articles. It would eliminate the practice of plagiarism infesting the academic community at an unprecedented pace. The model will also accelerate the process of reviewing academic documents, aiding in the speedy publishing of research articles.

Read full abstract

Vector Space Model Research Articles

Related Topics

Articles published on Vector Space Model

Discovery Model Based on Analogies for Teaching Computer Programming

Text Mining of Research Articles Using Clustering Approach

Web-Based Information Search System Development Using a Semantic Network

Analysis and implementation of the bi-polar slope one algorithm with the content base filtering method in producing culinary place recommendations in kuningan regency

Comparison of two methods on vector space model for trust in social commerce

Comparative Study on Feature-Based Scoring Using Vector Space Modelling System

Dynamic malware attack dataset leveraging virtual machine monitor audit data for the detection of intrusions in cloud

Frequency-Dependent Cortical Interactions during Semantic Processing: An Electrocorticogram Cross-spectrum Analysis Using a Semantic Space Model.

Deep phenotyping unstructured data mining in an extensive pediatric database to unravel a common KCNA2 variant in neurodevelopmental syndromes

A knowledge management-based engineering design system for highway design projects

LeDoCl : A Semantic Model for Legal Documents Classification using Ensemble Methods

A graphical decomposition and similarity measurement approach for topic detection from online news

Exploring semantic differences between the Indonesian prefixesPE-andPEN-using a vector space model

The search for science and technology verses in Qur’an and hadith

An open-source framework for ExpFinder integrating N-gram vector space model and μCO-HITS

Idea plagiarism detection with recurrent neural networks and vector space model

Soil-Moisture-Sensor-Based Automated Soil Water Content Cycle Classification With a Hybrid Symbolic Aggregate Approximation Algorithm

Identifying themes in fiction: A centroid-based lexical clustering approach

EMR2vec: Bridging the gap between patient data and clinical trial

Einstein tori and crooked surfaces

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Vector Space Model Research Articles

Related Topics

Articles published on Vector Space Model

Discovery Model Based on Analogies for Teaching Computer Programming

Text Mining of Research Articles Using Clustering Approach

Web-Based Information Search System Development Using a Semantic Network

Analysis and implementation of the bi-polar slope one algorithm with the content base filtering method in producing culinary place recommendations in kuningan regency

Comparison of two methods on vector space model for trust in social commerce

Comparative Study on Feature-Based Scoring Using Vector Space Modelling System

Dynamic malware attack dataset leveraging virtual machine monitor audit data for the detection of intrusions in cloud

Frequency-Dependent Cortical Interactions during Semantic Processing: An Electrocorticogram Cross-spectrum Analysis Using a Semantic Space Model.

Deep phenotyping unstructured data mining in an extensive pediatric database to unravel a common KCNA2 variant in neurodevelopmental syndromes

A knowledge management-based engineering design system for highway design projects

LeDoCl : A Semantic Model for Legal Documents Classification using Ensemble Methods

A graphical decomposition and similarity measurement approach for topic detection from online news

Exploring semantic differences between the Indonesian prefixesPE-andPEN-using a vector space model

The search for science and technology verses in Qur’an and hadith

An open-source framework for ExpFinder integrating N-gram vector space model and μCO-HITS

Idea plagiarism detection with recurrent neural networks and vector space model

Soil-Moisture-Sensor-Based Automated Soil Water Content Cycle Classification With a Hybrid Symbolic Aggregate Approximation Algorithm

Identifying themes in fiction: A centroid-based lexical clustering approach

EMR2vec: Bridging the gap between patient data and clinical trial

Einstein tori and crooked surfaces