Abstract

This paper proposes an approximate string matching with k-mismatches when calculating the generalized edit distance. When the edit distance is generalized, more sophisticated string matching can be provided. However, the execution time increases because of the bundle of complex computations for calculating complicated edit distances. The computational costs for finding which steps or edit distances are over k-mismatches cannot be significant in the generalized edit distance metric. Therefore, we can reduce the execution time by determining steps over k-mismatches and then skipping them. The diagonal step calculations using the pruning register skips unnecessary distance calculations over k-mismatches. The overhead of control statements and reordered memory accesses can be amortized by skipping multiple steps. Even though the proposed skipping method requires additional overhead, the proposed scheme's practical embodiments show that the execution time of string matching is reduced significantly when k is small.

Highlights

  • In the field of computer science, information retrieval is a fundamental problem

  • We show the experimental results depending on different edit distance metrics

  • When adopting the generalized edit distance metrics considering the visual similarity in shapes or keyboard character positions, the proposed skipping method can show better performance than the dynamic programming for small k-mismatches and the method using the reordered data structure

Read more

Summary

Introduction

In the field of computer science, information retrieval is a fundamental problem. Notably, string matching is essential to digital information retrieval. Despite additional overhead in the diagonal step calculations and pruning register accesses, experiments show that the proposed skipping method can reduce the execution time of approximate string matching when k is small. DðXa; YbÀ 1Þk þ insertionðybÞ when DðXa; YbÀ 1Þk k: In Eq (4), when the edit distance of a data-dependent previous step (D(Xα−1, Yβ−1)k, D(Xα −1, Yβ)k, and D(Xα, Yβ−1)k) is over k, there is no need to evaluate its operation for calculating D

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call