Abstract
To evaluate the informative content of a Web page, the Web structure has to be carefully analyzed. Hyperlink analysis, which is capable of measuring the potential information contained in a Web page with respect to the Web space, is gaining more attention. The links to and from Web pages are an important resource that has largely gone unused in existing search engines. Web pages differ from general text in that they posse’s external and internal structure. The Web links between documents can provide useful information in finding pages for a given set of topics. Making use of the Web link information would allow the construction of more powerful tools for answering user queries. Google has been among the first search engines to utilize hyper links in page ranking. Still two main flaws in Google need to be tackled. First, all the backlinks to a page are assigned equal weights. Second, less content rich pages, such as intermediate and transient pages, are not differentiated from more content rich pages. To overcome these pitfalls, this paper proposes a heuristic based solution to differentiate the significance of various backlinks by assigning a different weight factor to them depending on their location in the directory tree of the Web space.
Highlights
The Web is growing rapidly and as an important new medium for communication, it provides a tremendous amount of information related to a wide range of topics, continues to create new challenges for information retrieval
We propose a heuristic based solution to re-rank the results returned by a textbased search engine
In this study we have proposed an alternative solution to improve the ranking capability of Google
Summary
The Web is growing rapidly and as an important new medium for communication, it provides a tremendous amount of information related to a wide range of topics, continues to create new challenges for information retrieval. A search engine provides users with a mean to search for valuable information on the Web. Traditionally, Web search engines, which rely on keyword matching and frequency, visit the Web sites, fetch pages and analyze text information to build indexes. One of the problems of text-based search engines is that many Web pages among the returned results are low quality matches. It is common practice for some developers to attempt to gain attention by taking measures meant to mislead automated search engines. This can include the additional of spurious keywords to trick a search service into listing a page as rating highly in a popular subject. How to select the highest quality Web pages for placement at the top of the return list is the main concern of search engine design
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have