Abstract
Edit distance is a fundamental measure of distance between strings, the ex- tensive study of which has recently focused on computational problems such as nearest neighbor search, sketching and fast approximation. A very powerful paradigm is to map the metric space induced by the edit distance into a normed space (e. g., '1) with small dis- tortion, and then use the rich algorithmic toolkit known for normed spaces. Although the minimum distortion required to embed edit distance into '1 has received a lot of attention lately, there is a large gap between known upper and lower bounds. We make progress on this question by considering large, well-structured submetrics of the edit distance metric space. Our main technical result is that the Ulam metric, namely, the edit distance on permu- tations of length at most n, embeds into '1 with distortion O(log n). This immediately leads to sketching algorithms with constant size sketches, and to efficient approximate nearest neighbor search algorithms, with approximation factor O(log n). The embedding and its algorithmic consequences present a big improvement over those previously known for the Ulam metric, and they are significantly better than the state of the art for edit distance in general. Further, we extend these results for the Ulam metric to edit distance on strings that are (locally) non-repetitive, i. e., strings where (close by) substrings are distinct.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.