Given two strings X (|X|=m) and Y (|Y|=n) over an alphabet Σ, the edit distance between X and Y can be computed in O(mn/t) time with the help of the Four-Russians' lookup table whose block size is t. The Four-Russians' lookup table can be constructed in O((3|Σ|)2tt2) time using O((3|Σ|)2tt) space. However, the construction time and space requirement of the lookup table grow very fast as the alphabet size increases and thus it has been used only when |Σ| is very small. For example, when a string is a protein sequence, |Σ|=20 and thus it is almost impossible to use the Four-Russians' lookup table on typical workstations. In this paper, we present an efficient alphabet-independent Four-Russians' lookup table. It requires O(32t(2t)!t) space and can be constructed in O(32t(2t)!t2) time. Thus, the Four-Russians' lookup table can be constructed and used irrespective of the alphabet size. The time and space complexity were achieved by compacting the lookup table using a clever encoding of the preprocessed strings. Experimental results show that the space requirement of the lookup table is reduced to about 1/5,172,030 of its original size when |Σ|=26 and t=4. Furthermore, we present efficient multithreaded parallel algorithms for edit distance computation using the Four-Russians' lookup table. The parallel algorithm for lookup table construction runs in O(t) time and the parallel algorithm for edit distance computation between X and Y runs in O(m+n) time. Experiments performed on CUDA-supported GPU show that our algorithm runs about 942 times faster than the sequential version of the original Four-Russians' algorithm for 100 pairs of random strings of length approximately 1,000 when |Σ|=4 and t=4.
Read full abstract