Binary Alphabet Research Articles

A central question in computational biology is the design of genetic markers to distinguish between two given sets of (DNA) sequences. This question is formalized as the NP-complete Distinguishing Substring Selection problem (DSSS for short) which asks, given a set of "good" strings and a set of "bad" strings, for a solution string which is, with respect to the Hamming metric, "away" from the good strings and "close" to the bad strings. More precisely, given integers dg, db, and L, we ask for a length-L string s such that s has Hamming distance at least dg to every length-L substring of the good strings and such that every bad string has a length-L substring with Hamming distance at most db to s. Studying the parameterized complexity of DSSS, we show that, already for binary alphabet, DSSS is W[1]-hard with respect to its natural parameters. This, in particular, implies that a recently given polynomial-time approximation scheme (PTAS) by Deng et al. cannot be replaced by a so-called efficient polynomial-time approximation scheme (EPTAS) unless an unlikely collapse in parameterized complexity theory occurs. This is seemingly the first computational biology problem for which such a border between PTAS (which exists) and EPTAS (which is unlikely to exist) could be established. By way of contrast, for a special case of DSSS, we present an exact fixed-parameter algorithm solving the problem efficiently. In this way we also exhibit a sharp border between fixed-parameter tractability and intractability results.

AbstractWe present binary coding algorithm for the α‐ and β‐protein fold prediction. The method links amino acid molecular polarity patterns and physicochemical properties of nucleotide bases coded by means of a binary addresses. Primary sequences that define secondary protein structure were analyzed with respect to the symbolic oligopeptides (SO) obtained by the reduction of the 20 amino acid letter alphabet into a binary alphabet of nonpolar group 0 (W, C, I, F, M, V, L, Y) and polar group 1 (Q, R, H, K, N, E, D, S, G, T, A, P). The groups were extracted from the Grantham polarity scale with the clustering around medoids procedure. The transformation of protein strings into binary coding patterns of the polar and nonpolar amino acid groups reduced analyzed elements within the protein motif of length n by the factor of 10n. SMO learning algorithm for the support vector machines was applied to classify α‐helices and β‐strands. It was shown that the relative frequencies of binary hexapeptides classify all 174 nonhomologous α‐ and β‐protein folds from the Jpred database with 100% accuracy. The results of 10‐fold cross‐validation and leave‐one‐out test were 86.78%. Classification tree confirmed the results of SMO analysis and correctly classified 100% of the folds by means of 9 binary hexapeptides. Linear block triple‐check code was proposed for the description of hexapeptide patterns. The presented method enables simple, quick, and accurate prediction of α‐ and β‐protein folding types from the primary amino acid and nucleotide sequences on a personal computer. Our results imply that few amino acid polarity patterns specified by the nucleotide physicochemical properties describe basic protein folding types with >90% accuracy. © 2003 Wiley Periodicals, Inc. Int J Quantum Chem, 2003

Binary Alphabet Research Articles

Related Topics

Articles published on Binary Alphabet

Non-approximability of weighted multiple sequence alignment for arbitrary metrics

FFT-based algorithms for the string matching with mismatches problem

Parameterized Intractability of Distinguishing Substring Selection

First-order expressibility of languages with neutral letters or: The Crane Beach conjecture

State complexity of some operations on binary regular languages

Type inference for light affine logic via constraints on words

Hardness results for the center and median string problems under the weighted and unweighted edit distances

On deterministic finite automata and syntactic monoid size

Randomness relative to Cantor expansions

On the Capacity Loss Due to Separation of Detection and Decoding

Properties of the complexity function for finite words

Markov Types and Minimax Redundancy for Markov Sources

Patterns in words and languages

On enumeration problems in Lie–Butcher theory

Algorithmic complexity of protein identification: combinatorics of weighted strings

Developing an Infrastructure for Sharing Environmental Models

Prediction of secondary protein structure with binary coding patterns of amino acid and nucleotide physicochemical properties

On the dimensions of the spectral measure of symmetric binary substitutions

Bounds for parametric sequence comparison

On-Line Approximate String Searching Algorithms: Survey and Experimental Results

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Binary Alphabet Research Articles

Related Topics

Articles published on Binary Alphabet

Non-approximability of weighted multiple sequence alignment for arbitrary metrics

FFT-based algorithms for the string matching with mismatches problem

Parameterized Intractability of Distinguishing Substring Selection

First-order expressibility of languages with neutral letters or: The Crane Beach conjecture

State complexity of some operations on binary regular languages

Type inference for light affine logic via constraints on words

Hardness results for the center and median string problems under the weighted and unweighted edit distances

On deterministic finite automata and syntactic monoid size

Randomness relative to Cantor expansions

On the Capacity Loss Due to Separation of Detection and Decoding

Properties of the complexity function for finite words

Markov Types and Minimax Redundancy for Markov Sources

Patterns in words and languages

On enumeration problems in Lie–Butcher theory

Algorithmic complexity of protein identification: combinatorics of weighted strings

Developing an Infrastructure for Sharing Environmental Models

Prediction of secondary protein structure with binary coding patterns of amino acid and nucleotide physicochemical properties

On the dimensions of the spectral measure of symmetric binary substitutions

Bounds for parametric sequence comparison

On-Line Approximate String Searching Algorithms: Survey and Experimental Results