접미사 배열을 이용한 Suffix-Prefix가 일치하는 모든 쌍 찾기

Seon-Mi Han,Jin-Woon Woo

doi:10.3745/kipsta.2010.17a.5.221

Abstract

최근 문자열 연산들이 계산 생물학 및 인터넷의 보안, 검색 분야에 응용되면서 효율적인 문자열 연산을 위한 다양한 자료구조와 알고리즘이 연구되고 있다. suffix-prefix가 일치하는 모든 쌍 찾기는 두 개 이상의 문자열이 주어질 때 각 쌍의 문자열에 대해 가장 긴 suffix와 일치하는 prefix를 찾는 것으로 가장 짧은 슈퍼스트링을 검출하는 근사 알고리즘에서 사용될 뿐만 아니라 생물정보학, 데이터 압축 분야에서도 중요하게 사용된다. 본 논문에서는 접미사 배열을 이용하는 suffix-prefix가 일치하는 모든 쌍 찾기 알고리즘을 제안하며 O(<TEX>$k{\cdot}m$</TEX>) 시간 복잡도를 가진다. 접미사 배열 알고리즘이 접미사 트리 알고리즘 보다 소요 시간과 메모리 면에서 더 우수함을 실험을 통해서 제시한다. Since string operations were applied to computational biology, security and search for Internet, various data structures and algorithms for computing efficient string operations have been studied. The all-pairs suffix-prefix matching is to find the longest suffix and prefix among given strings. The matching algorithm is importantly used for fast approximation algorithm to find the shortest superstring, as well as for bio-informatics and data compressions. In this paper, we propose an algorithm to find all-pairs suffix-prefix matching using the suffix array, which takes O(<TEX>$k{\cdot}m$</TEX>)�� time complexity. The suffix array algorithm is proven to be better than the suffix tree algorithm by showing it takes less time and memory through experiments.

Full Text