Abstract
World Wide Web is very dynamic in its nature and we experienced changes in web pages every day. Web pages are updated, deleted, created or moved from one domain to another. Due to its dynamic nature often the web users experience broken links. Internet has been suffering from broken links problem despite of its contemporary services. Broken links are frequent problem occurring in web domain. Sometimes the page which was pointing from another page has been disappeared forever or moved to some other location. There are numerous reasons behind broken links. Some of these are permanently deleted Web pages, or modification made in Web pages causes broken links or the link of target page has some errors in code of source page. Researchers proposed several techniques in order to recover the broken links or at least retrieve some relevant pages. Number of sources have been used in research community for broken links recover like URL of target page, Anchor text, surround text near to anchor text and text in the source pages. All these sources of information are useful for retrieving the candidate pages relevant to broken links. System returns a ranked list of highly relevant candidate pages on submitting a query which has been extracted from different sources listed above. Previous work relies on TF (Term Frequency) or DF (Document Frequency) weights for extracting term from anchor text and full text of page containing missing links but not showed good results which cause the problem of retrieving similar pages for multiple broken links. In this paper we investigate the use of term proximity (position) relationship between the terms of anchor text and full text in order to extract relevant (good and bad) terms through classification model. This solves the problem by providing different query terms for multiple broken links and also increases the effectiveness as the terms that are proximity close to each other reveal more relevance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.