Abstract

This paper explains about similarity measure and the relationship between the knowledge repositories. This paper also describes the significance of document similarity measures, algorithms and to which type of text it can be applied Document similarity measures are of full text similarity, paragraph similarity, sentence similarity, semantic similarity, structural similarity and statistical measures. Two different frameworks had been proposed in this paper, one for measuring document to document similarity and the other model which measures similarity between documents to multiple documents. These two proposed models can use any one of the similarity measures in implementation aspect, which is been put forth for further research.

Highlights

  • Now-a-days information on the web is increasing rapidly day-by-day

  • In order to resolve this issue this paper proposes semantic similarity based document retrieval

  • Several natural language applications such as information retrieval, information recommendation, and machine translation require the similarity between sentences or documents

Read more

Summary

INTRODUCTION

Objectives: Now-a-days information on the web is increasing rapidly day-by-day. The increase of web based information and number of internet users’, difficult to find the relevant documents for users to particular needs. Several natural language applications such as information retrieval, information recommendation, and machine translation require the similarity between sentences or documents. Several recent applications of natural language processing demand an effective approach to calculating the similarity between sentences as in [1]. The measure of similarity and relatedness can be extended to many types of entities, such as words, sentences, texts, concepts, or Ontologies depending on the requirement. Tasks such as document classification and clustering, information retrieval, and synonym extraction require precise measurement of semantic similarity between words. As the several applications and domains require semantic similarity, the measurement of sentence / document similarity has greater significance. Calculating semantic similarity among entities has application in several areas such as recommendation systems, e-commerce, search engines, biomedical informatics and in natural language processing tasks such as word sense disambiguation. The short text similarity is important in applications like text summarization as in [6], text categorization as in [7], and machine translation as in [8]

RELATED WORK
Hybrid Approaches
PROPOSED WORK
Jaccard Similarity Coefficient
Hybrid Approach For Measuring Document Similarity Using Ontology And Corpus
POS Tagger Using Hidden Markov Model
Sweto Ontology
RESULT
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call