An extension of the FFT‐based algorithm for the match‐count problem to weighted scores

Kensuke Baba

doi:10.1002/tee.22554

Abstract

The match‐count problem on strings is the basic problem of counting the matches of characters between two strings for every possible alignment. The problem is classically computed in O(σ n log m) time using a fast Fourier transform (FFT) for two strings of lengths m and n (m ≤ n) over an alphabet of size σ. This paper extends the target of this FFT‐based algorithm to a weighted version of the problem, which computes the sum of similarities between characters instead of the number of matches. The algorithm extended in this paper can solve the weighted match‐count problem in O(dn log m) time by mapping characters to numerical vectors of dimensionality d. This paper also evaluates the usefulness of the extended algorithm by applying it to plagiarism detection in documents. The experimental results show that the proposed algorithm is applicable to general vector representation of words and that the obtained plagiarism detection method can extremely reduce the processing time with a slight decrease of accuracy from the method based on the normal match‐count problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An extension of the FFT‐based algorithm for the match‐count problem to weighted scores

Abstract

Talk to us

Similar Papers

More From: IEEJ Transactions on Electrical and Electronic Engineering

Lead the way for us

Journal: IEEJ Transactions on Electrical and Electronic Engineering	Publication Date: Dec 1, 2017
Citations: 3

Similar Papers

Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing
Takuya Takagi ... Shunsuke Inenaga
-
Takuya Takagi, et. al.Takuya Takagi ... Shunsuke Inenaga
01 Jan 2015
01 Jan 2015

Inferring strings from Lyndon factorization
Yuto Nakashima ... Masayuki Takeda
Theoretical Computer Science | VOL. 689
Yuto Nakashima, et. al.Yuto Nakashima ... Masayuki Takeda
12 Jun 2017
Theoretical Computer Science | VOL. 689

Improved Approximation for Longest Common Subsequence over Small Alphabets.
...
-
, et. al. ...
01 Jan 2020
01 Jan 2020

An acceleration of FFT-based algorithms for the match-count problem
Kensuke Baba
Information Processing Letters | VOL. 125
Kensuke BabaKensuke Baba
27 Apr 2017
Information Processing Letters | VOL. 125

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An extension of the FFT‐based algorithm for the match‐count problem to weighted scores

Abstract

Talk to us

Similar Papers

More From: IEEJ Transactions on Electrical and Electronic Engineering