Topic-specific Crawler Research Articles

<p><span lang="EN-IN">The large and wide range of information has become a tough time for crawlers and search engines to extract related information. This paper discusses about focused crawlers also called as topic specific crawler and variations of focused crawlers leading to distributed architecture, i.e., context aware notification architecture. To get the relevant pages from a huge amount of information available in the internet we use the focused crawler. This can bring out the relevant pages for the given topic with less number of searches in a short time. Here the input to the focused crawler is a topic specified using exemplary documents, but not using the keywords. Focused crawlers avoid the searching of all the web documents instead it searches over the links that are relevant to the crawler boundary. The Focused crawling mechanism helps us to save CPU time to large extent to keep the crawl up-to-date.</span></p>

Read full abstract

One of the major problems for automatically constructed portals and information discovery systems is how to assign proper order to unvisited web pages. Topic-specific crawlers and information seeking agents should try not to traverse the off-topic areas and concentrate on links that lead to documents of interest. In this paper, we propose an effective approach based on the relevancy context graph to solve this problem. The graph can estimate the distance and the relevancy degree between the retrieved document and the given topic. By calculating the word distributions of the general and topic-specific feature words, our method will preserve the property of the relevancy context graph and reflect it on the word distributions. With the help of topic-specific and general word distribution, our crawler can measure a page's expected relevancy to a given topic and determine the order in which pages should be visited first. Simulations are also performed, and the results show that our method outperforms than the breath-first and the method using only the context graph.

Read full abstract

Topic-specific Crawler Research Articles

Related Topics

Articles published on Topic-specific Crawler

Focused crawling from the basic approach to context aware notification architecture

An Optimized Relevancy Context Graph Based on Social Network

Multi-level Frontier based Topic-specific Crawler Design with Improved URL Ordering

Topic-specific crawling on the Web with the measurements of the relevancy context graph

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Topic-specific Crawler Research Articles

Related Topics

Articles published on Topic-specific Crawler

Focused crawling from the basic approach to context aware notification architecture

An Optimized Relevancy Context Graph Based on Social Network

Multi-level Frontier based Topic-specific Crawler Design with Improved URL Ordering

Topic-specific crawling on the Web with the measurements of the relevancy context graph