Abstract

Query-based information retrieval is an essential part of the web search engine. Many researchers have applied different types of web mining technologies to find more relevant information based on the keyword but are not able to know the correct meaning of the term (keyword) single, multiword or phrases. In this paper we address this problem of searching phrases. In this work the phrase searching process is three-fold as whole Phrase, Sequence of term in phrase and Mingle of the term in the phrase. Here the user enters a query as phrases that is passed to various search engines and retrieves the top ‘n’ list of web pages. Initially preprocessing is performed on the Sequence of Keyword in phrase and Mingle of the keyword in the phrase. Then feature extraction is done based on the web pages in the various search engines using term Frequency-Inverse Document Frequency method. Following the feature extraction, grouping of the top ‘n’ list of web pages from various search engines based on the parameter as title-based, snippet-based, content-based, address-based, link-based, uniform resource locator-based, and co-occurrence-based calculation is done using LBG clustering algorithm. Then identified the unique link from the above grouping of web pages from the various search engines using SVM classifier and assigned the rank value to the unique link web pages are done using proposed ranking algorithm. Finally it is observed from this experiment that precision, recall, f-measure, accuracy, speed, and error rate show significant improvement than the traditional search engines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.