An improved Approach for Document Retrieval Using Suffix Trees

N Sandhya,K Anuradha,A Govardhan,Y Sri

doi:10.14569/ijacsa.2011.020906

Abstract

Huge collection of documents is available at few mouse clicks. The current World Wide Web is a web of pages. Users have to guess possible keywords that might lead through search engines to the pages that contain information of interest and browse hundreds or even thousands of the returned pages in order to obtain what they want. In our work we build a generalized suffix tree for our documents and propose a search technique for retrieving documents based on a sort of phrase called word sequences. Our proposed method efficiently searches for a given phrase (with missing or additional words in between) with better performance. Keywords-Document retrieval; Frequent Word Sequences; Suffix tree; Traversal technique. I. INTRODUCTION With the growth of web, hundreds of millions of people engage in information retrieval process every day when they use web search engine or search their emails. IR is fast becoming the dominant form of information access, overtaking traditional database style searching. IR process begins when user enters a query like search strings in web search engines, phrases etc. to identify the related documents or URLs. Now almost all the documents have electronic copies. With the development of WWW it is an efficient technique to retrieve the documents using the web search engines based on a query. But this should not be time consuming. That is the reason precision of the retrieval of related documents for a given query is vital for the search engine. Cluster based information retrieval techniques also exist (11). The next section deals with the Information Retrieval and its related work on text documents. Section 3 describes Suffix Tree. Section 4 deals with building generalized suffix tree. Section 5 explains traversal technique Algorithm used for quick retrieval of documents. Section 6 shows the experiment

Highlights

With the growth of web, hundreds of millions of people engage in information retrieval process every day when they use web search engine or search their emails
The goal of this step is to reduce the dimension of the database by eliminating those words that are not frequent enough to be in a frequent kword sequence, for k >= 2
After building the suffix tree as mentioned above, we traverse the tree for a given word sequence ― eat chocolates ‖, which should retrieve all the documents that contain “children eat chocolates”, “children eat dry fruits and chocolates” “ children of four years eat many chocolates”

Summary

INTRODUCTION

With the growth of web, hundreds of millions of people engage in information retrieval process every day when they use web search engine or search their emails. IR process begins when user enters a query like search strings in web search engines, phrases etc. With the development of WWW it is an efficient technique to retrieve the documents using the web search engines based on a query. That is the reason precision of the retrieval of related documents for a given query is vital for the search engine. Cluster based information retrieval techniques exist [11]. The section deals with the Information Retrieval and its related work on text documents.

RELATED WORK

SUFFIX TREE

Definition

CONSTRUCTION OF SUFFIX TREES FOR DOCUMENTS

Finding frequent 2-word sets

EXPERIMENTAL SETUP

Cleaning of documents and generating suffixes

Generating Suffixes and Building GST

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2011
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

An improved Approach for Document Retrieval Using Suffix Trees

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

Guest editors' introduction: special section on mining and searching the web
Bing Liu ... S Chakrabarti
IEEE Transactions on Knowledge and Data Engineering | VOL. 16
Bing Liu, et. al. Bing Liu ... S Chakrabarti
01 Jan 2004
IEEE Transactions on Knowledge and Data Engineering | VOL. 16

New Concept based Indexing Technique for Search Engine
Sangita Karmakar ... Soumen Swarnakar
Indian Journal of Science and Technology | VOL. 10
Sangita Karmakar, et. al.Sangita Karmakar ... Soumen Swarnakar
01 May 2017
Indian Journal of Science and Technology | VOL. 10

Towards Next Generation Web Information Retrieval
Wei-Ying Ma ... Hsiao-Wuen Hon
-
Wei-Ying Ma, et. al.Wei-Ying Ma ... Hsiao-Wuen Hon
01 Jan 2004
01 Jan 2004

Know thy source: medical information on the internet.
Christoph U Lehmann ... Anusha H Hemachandra
Journal of perinatology : official journal of the California Perinatal Association | VOL. 23
Christoph U Lehmann, et. al.Christoph U Lehmann ... Anusha H Hemachandra
01 Jul 2003
Journal of perinatology : official journal of the California Perinatal Association | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An improved Approach for Document Retrieval Using Suffix Trees

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications