Abstract

Compact bases formed by motifs called “irredundant” and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet Σ , the problem of extracting such a base from s had been previously solved in time O ( n 2 log n log ∣ Σ ∣ ) and O ( ∣ Σ ∣ n 2 log 2 n log log n ) , respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O ( n 2 ) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O ( n 3 ) . A much faster incremental algorithm is described here, which takes time O ( n 2 log n ) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O ( n ) space, whence for finite alphabets the proposed solution is within a log n factor from optimality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.