Designing a fast file system crawler with incremental differencing

Tim Bisson,Shankar Pasupathy,Yuvraj Patel

doi:10.1145/2421648.2421652

Abstract

Search engines for storage systems rely on crawlers to gather the list of files that need to be indexed. The recency of an index is determined by the speed at which this list can be gathered. While there has been a substantial amount of literature on building efficient web crawlers, there is very little literature on file system crawlers. In this paper we discuss the challenges in building a file system crawler. We then present the design of two file system crawlers: the first uses the standard POSIX file system API but carefully controls the amount of memory and CPU that it uses. The second leverages modifications to the file systems's internals, and a new API called SnapDiff, to detect modified files rapidly. For both crawlers we describe the incremental differencing design; the method to produce a list of changes between a previous crawl and the current point in time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Designing a fast file system crawler with incremental differencing

Abstract

Talk to us

Similar Papers

More From: ACM SIGOPS Operating Systems Review

Lead the way for us

Journal: ACM SIGOPS Operating Systems Review	Publication Date: Dec 18, 2012
Citations: 6

Similar Papers

Open Source File System Selection for Remote Sensing Data Operational Storage and Processing
Andrei N Vinogradov ... Evgeny P Kurshev
-
Andrei N Vinogradov, et. al.Andrei N Vinogradov ... Evgeny P Kurshev
30 Nov 2019
30 Nov 2019

Drishti
Shripad Nadgowda ... Canturk Isci
-
Shripad Nadgowda, et. al.Shripad Nadgowda ... Canturk Isci
11 Oct 2018
11 Oct 2018

A High Performance Cluster File System with Standard Network File System Interface
Jun Lu ... Yi Zhu
-
Jun Lu, et. al.Jun Lu ... Yi Zhu
01 May 2009
01 May 2009

Design and implementation of a configurable mixed-media file system
Silvano Maffeis
ACM SIGOPS Operating Systems Review | VOL. 28
Silvano MaffeisSilvano Maffeis
01 Oct 1994
ACM SIGOPS Operating Systems Review | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Designing a fast file system crawler with incremental differencing

Abstract

Talk to us

Similar Papers

More From: ACM SIGOPS Operating Systems Review