Improved file synchronization techniques for maintaining large replicated collections over slow networks

T Suel,D Trendafilov,P Noel

doi:10.1109/icde.2004.1319992

Abstract

We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of important applications, such as synchronization of data between accounts or devices, content distribution and Web caching networks, Web site mirroring, storage networks, and large scale Web search and mining. At the core of the problem lies the following challenge, called the file synchronization problem: given two versions of a file on different machines, say an outdated and a current one, how can we update the outdated version with minimum communication cost, by exploiting the significant similarity between the versions? While a popular open source tool for this problem called rsync is used in hundreds of thousands of installations, there have been only very few attempts to improve upon this tool in practice. We propose a framework for remote file synchronization and describe several new techniques that result in significant bandwidth savings. Our focus is on applications where very large collections have to be maintained over slow connections. We show that a prototype implementation of our framework and techniques achieves significant improvements over rsync. As an example application, we focus on the efficient synchronization of very large Web page collections for the purpose of search, mining, and content distribution.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved file synchronization techniques for maintaining large replicated collections over slow networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Remote video file synchronization for heterogeneous mobile clients
Hao Zhang ... Chuohao Yeo
-
Hao Zhang, et. al.Hao Zhang ... Chuohao Yeo
20 Aug 2009
20 Aug 2009

Improved single-round protocols for remote file synchronization
U Irmak ... S Mihaylov
-
U Irmak, et. al.U Irmak ... S Mihaylov
19 Jun 2015
19 Jun 2015

Cooperative and efficient content caching and distribution mechanism in 5G network
Ying Sai ... Meng-Yang Fan
Computer Communications | VOL. 161
Ying Sai, et. al.Ying Sai ... Meng-Yang Fan
25 Jul 2020
Computer Communications | VOL. 161

Benchmarking and monitoring framework for interconnected file synchronization and sharing services
Piotr Mrówczyński ... Frederik Orellana
Future Generation Computer Systems | VOL. 78
Piotr Mrówczyński, et. al.Piotr Mrówczyński ... Frederik Orellana
16 Mar 2017
Future Generation Computer Systems | VOL. 78

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved file synchronization techniques for maintaining large replicated collections over slow networks

Abstract

Talk to us

Similar Papers