Input String Research Articles

We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, due to an algorithm by Chan).For integer cost functions, the running time is further improved to . In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant , the algorithm preprocesses an n × n matrix in time and space. Then, it may multiply the matrix with any given n-length vector in time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in time and space, where is the length of the run-length encoding of the strings.

We define a novel variation on the constrained sequence alignment problem in which the constraint is given in the form of a regular expression. Given two sequences, an alphabet Γ describing pairwise sequence alignment operations, and a regular expression R over Γ, the problem is to compute the highest scoring sequence alignment A of the given sequences, such that A∈Γ⁎L(R)Γ⁎.Two algorithms are given for solving this problem. The first basic algorithm is general and solves the problem in O(nmrlog2r) time and O(min{n,m}r) space, where m and n are the lengths of the two sequences and r is the size of the NFA for R. The second algorithm is restricted to rigid patterns and exploits this restriction to reduce the NFA size factor r in the time complexity to a smaller factor corresponding to the length of the rigid pattern. A rigid pattern P is a regular expression of the form P=P1∪⋯∪Pk, where Pi does not contain the Kleene-closure star or union. |P| is compacted by supporting alignment patterns P that do not contain the Kleene-closure star, and exploits this constraint to reduce the NFA size factor r in the time complexity to a smaller factor |P|. |P| is compacted by supporting alignment patterns extended by meta-characters including general insertion, deletion and match operations, as well as some cases of substitutions. meta-characters used in P. {m,i}⁎ or P∈(Γ∪{m,d})⁎, the problem can be solved in time O(nm), while for a pattern P∈(Γ∪{m,i,d})⁎, the problem can be solved in time O(nmlog|P|). For a pattern P∈(Γ∪{m,s,i,d})⁎, the problem can be solved in time O(nmlog|P|) in some cases: one case is for scoring functions Score for which there exists Score′:Σ→R such that Score(ν,σ)=Score′(ν)+Score′(σ) for every ν≠σ, and the other is when occs(P)=O(log(max{n,m})). For a rigid pattern P=P1∪⋯∪Pk, these time bounds range from O(knm) to O(knmlog(max{|Pi|})), depending on the meta-characters used in P.An additional result obtained along the way is an extension of the algorithm of Fischer and Paterson for String Matching with Wildcards. Our extension allows the input strings to include “negation symbols” (that match all letters but a specific one) while retaining the original time complexity.We implemented both algorithms and applied them to data-mine new miRNA seeding patterns in C. elegans Clip-seq experimental data.

Input String Research Articles

Related Topics

Articles published on Input String

Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning

Adaptive non-critical alarm reduction using hash-based contextual signatures in intrusion detection

Efficient edit distance with duplications and contractions

Irreversibility and dissipation in finite-state automata

Exact online two-dimensional pattern matching using multiple pattern matching algorithms

Algorithms for path-constrained sequence alignment

Parsing by matrix multiplication generalized to Boolean grammars

Towards adaptive character frequency-based exclusive signature matching scheme and its applications in distributed intrusion detection

A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments

Irredundant tandem motifs

Practical linear-time O (1)-workspace suffix sorting for constant alphabets

Bounded repairability of word languages

The impact of number mismatch and passives on the real-time processing of relative clauses

On approximating string selection problems with outliers

Adaptive blacklist-based packet filter with a statistic-based approach in network intrusion detection

Efficient repeat finding in sets of strings via suffix arrays

PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly

The Moments Of The Profile In Random Binary Digital Trees

String analysis by sliding positioning strategy

Symbolic PathFinder: integrating symbolic execution with model checking for Java bytecode analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Input String Research Articles

Related Topics

Articles published on Input String

Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning

Adaptive non-critical alarm reduction using hash-based contextual signatures in intrusion detection

Efficient edit distance with duplications and contractions

Irreversibility and dissipation in finite-state automata

Exact online two-dimensional pattern matching using multiple pattern matching algorithms

Algorithms for path-constrained sequence alignment

Parsing by matrix multiplication generalized to Boolean grammars

Towards adaptive character frequency-based exclusive signature matching scheme and its applications in distributed intrusion detection

A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments

Irredundant tandem motifs

Practical linear-time O (1)-workspace suffix sorting for constant alphabets

Bounded repairability of word languages

The impact of number mismatch and passives on the real-time processing of relative clauses

On approximating string selection problems with outliers

Adaptive blacklist-based packet filter with a statistic-based approach in network intrusion detection

Efficient repeat finding in sets of strings via suffix arrays

PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly

The Moments Of The Profile In Random Binary Digital Trees

String analysis by sliding positioning strategy

Symbolic PathFinder: integrating symbolic execution with model checking for Java bytecode analysis