We study two basic problems regarding edit errors, document exchange and error correcting codes. Here, two parties try to exchange two strings with length roughly n and edit distance at most k , or one party tries to send a string of length n to another party through a channel that can introduce at most k edit errors. The goal is to use the least amount of communication or redundancy possible. Both problems have been extensively studied for decades, and in this article, we focus on deterministic document exchange protocols and binary codes for insertions and deletions (insdel codes). It is known that for small k (e.g., k ≤ n/4 ), in both problems the optimal communication or redundancy size is Θ ( k log n/k). In particular, this implies the existence of binary codes that can correct ε fraction of edit errors with rate 1-Θ (ε log 1/ε )). However, known constructions are far from achieving these bounds. In this article, we significantly improve previous results on both problems. For document exchange, we give an efficient deterministic protocol with communication complexity O ( k log 2 n/k . This significantly improves the previous best-known deterministic protocol, which has communication complexity O ( k 2 + k log 2 n ) [ 4 ]. For binary insdel codes, we obtain the following results: (1) An explicit binary insdel code with redundancy O ( k log 2 n/k). In particular this implies an explicit family of binary insdel codes that can correct ε fraction of insertions and deletions with rate 1-O(ε log 2 (1/ε))=1-Õ(ε). This significantly improves the previous best-known result, which only achieves rate 1-Õ(√ ε) [ 14 ], [ 15 ], and is optimal up to a log (1/ε factor. (2) An explicit binary insdel code with redundancy O ( k log n ). This significantly improves the previous best-known result of Reference [ 6 ], which only works for constant k and has redundancy O ( k 2 log k log n ); and that of Reference [ 4 ], which has redundancy O ( k 2 + k log 2 n ). Our code has optimal redundancy for k ≤ n 1-α , any constant 0< α < 1. This is the first explicit construction of binary insdel codes that has optimal redundancy for a wide range of error parameters k . In obtaining our results, we introduce several new techniques. Most notably, we introduce the notion of ε-self-matching hash functions and ε-synchronization hash functions . We believe our techniques can have further applications in the literature.
Read full abstract