Abstract

The number of files stored on our personal computer (PC) is increasing very quickly and locating information in this environment is difficult. In this paper we present a set building blocks for implementing a system which is composed of four modules i.e. indexing mechanism, analyzing text, index storing and searching mechanism. The implementation of these four modules are described in details and additionally, we provide implementation of user interface, and how they interact with each other. Keywords - Desktop Search, Information Retrieval, indexing, searching, personal computer (PC). 3 I. INTRODUCTION With the development of computer technology, computer can complete many kinds of complicated tasks. Therefore, the number of files stored in the PC is increasing very quickly. At the same time, the number of various documents stored in the PC, such as digital photos, text files, video and audio files, increases in an amazing rate. However, a new problem arises; computer users have to spend much time searching the useful information in the ocean of the computer data, and sometimes, even ever seen or used files by users cannot be found. Therefore, the current problem which the users face is not how to save the file, but how to find and locate the file as quickly as possible. In other words, the traditional desktop information retrieval technology cannot meet the current needs of the users. This will inevitably lead to development of new technologies of desktop search engine. Desktop search engine is designed to help users find and locate the required information or documents from the PC effectively. Today, desktop search engine technologies become more popular in field of information retrieval. The full-text search engine is the one that can search each word in documents. The full-text search engine first indexes for each document, then search system will search the index database according to the keywords which are inputted by users. The search results will return to users according to a certain sorting algorithm. The characteristic of the search engine is huge amount of information. It segments the whole content of the document and adds to the index database so that it has high recall ratio. The full-text search engine also has the characteristics of short cycle, rapid development and cheap costs. Today there are often massive of documents stored in our PC's, we often spend much time on finding documents which are needed. The desktop search systems index the large number of documents, so that they can locate the needed documents immediately. The desktop search systems solve the problems of the difficulty of finding the right document. With the number of open-source search engine tools, we can design our personal desktop search engine conveniently and efficiently. Lucene is an open-source full-text search engine tool which is excellent and popular. The following will introduce the characteristics and the basic frameworks of Lucene. Lucene is a full-text search engine. You can search a lot of documents including specified words. The characteristics of Lucene is (8): High performance of search; High scalability of target documents; Morphological analyzer; Phrase search, regular expressions, attribute search, and similarity search; Multilingualism with Unicode; Independent of the file format and repository; Intelligent web crawler; Simple and powerful API.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.