Abstract

Focused Crawler collects domain specific web page from the internet. However, the performance of focused web crawler depends upon the multidimensional nature of the web page. This paper presents a comprehensive analysis of recent web page classifiers for focused crawlers and also explores the impact of web-based feature in collaboration with web classifier. It also evaluates the performance of classification technique such as Support vector machine, Naive Bayes, Linear Regression and Random Forest over web page classification. Along with that it examines the impact of web feature i.e. anchor text, Page content and link over web page classification. Finally the paper yield interesting result about the collective response of web feature and classification technique for web page classification as a relevant class and irrelevant class.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.