A PSO Algorithm Based Web Page Retrieval System

Arpit Deo,Jayesh Gangrade,Shweta Gangrade

doi:10.2139/ssrn.3365545

Abstract

Webpage retrieval is the process of obtaining and presenting more related webpage from the largest collection of webpage resources according to the user’s need. The tremendous growth in information resources on the Internet makes the information retrieval process a tedious and difficult task for users. Due to information overloading, there is a need for better techniques to retrieve most relevant information from web. This paper presents the webpage retrieval system by using the PSO algorithm. In presented system, to extract the text from web documents, all html tags are removed. After that stop words and special characters are removed from extracted text for recovering only meaningful contents. The TF-IDF concept is used for feature selection. Now PSO optimization technique is used for identifying and refining the features set, these selected features are stored in a database which is used for webpage retrieval process. On the other side of the system, in search technique firstly the user query sequences are transformed into multiple query strings and using these query strings the search is performed over the database. The performance of the search system is computed in terms of accuracy and error rate. The results are also compared with the traditional search model which demonstrate the proposed technique is superior then the traditional search system.

Full Text