Efficient indexing structure to handle durable queries through web crawling

R Suganya Devi,D Manjula,Vijayan Sugumaran

doi:10.1007/s10586-016-0595-4

Abstract

This paper studies efficient processing of durable top-k queries on historical time series databases. Durable top-k queries, obtained as an extension of snapshot top-k queries during a certain time period, play a key role in finding objects with durable quality and predicting the status of these objects for successive time intervals by updating the query interval at all timestamps. Web crawling and indexing are tremendously significant in recent times, especially in terms of achieving efficient durable top-k queries from vast quantum of web documents. Existing algorithms that have been employed throw up results that are less than applicable to analyzers. This paper chiefly focuses on web crawling and indexing query terms under their respective categories and updating rank changes at every time interval. Links are crawled using the modified depth-first search (MDFS) algorithm, accessed, and metadata such as the title, keywords, and descriptions extracted. To handle query indexing, novel indexing techniques are proposed to yield efficient results. This study is invaluable for analysts working on large data obtained as a result of crawling and indexing, effectively decreasing their workload.

Full Text