Abstract

Text in articles is based on expert opinion of a large number of people including the views of authors. These views are based on cultural or community aspects, which make extracting information from text very difficult. This paper introduced how to utilize the capabilities of a modified graph-based Self-Organizing Map (SOM) in showing text similarities. Text similarities are extracted from an article using Google's PageRank algorithm. Sentences from an input article are represented as graph model instead of vector space model. The resulted graph can be shown in a visual animation for eight famous graph algorithms execution with animation speed control.The resulted graph is used as an input to SOM. SOM clustering algorithm is used to construct knowledge from text data. We used a visual animation for eight famous graph methods with animation speed control and according to similarity measure; an adjustable number of most similar sentences are arranged in visual form. In addition, this paper presents a wide variety of text searching. We had compared our project with famous clustering and visualization project in term of purity, entropy and F measure. Our project showed accepted results and mostly superiority over other projects.

Highlights

  • A context can be composed with variant sets of vocabularies and still express the same meaning

  • The aim of this paper is to build a graph-based unsupervised clustering based on Self-Organizing Map (SOM) for context extracted from text for semantics representation

  • We randomly selected 1000 sentences from the one million sentences, computed their exact 1–10 nearest neighbors in the whole article and used SOM methods with different parameter settings to measure the impact on the clustering quality, comparing it to a manual exact clustering

Read more

Summary

Introduction

A context can be composed with variant sets of vocabularies and still express the same meaning. Sets of vocabularies in a text documented in an article or a webpage is subjected to opinion of a large number of people including the views of authors It has different cultural or community aspects, which make extracting information from it very difficult. Text analysis is based on the descriptive function at a high level of the context, like the matrix structure presented in [10] or graphical representation based on the text of the attributes of the metadata described in [20]. This descriptive content can be clearly found in encyclopedias website such as Wikipedia, Encyclopedia.com, and Webopedia etc

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call