Biosequence Analysis Research Articles

We have recently described a method based on artificial neural networks to cluster protein sequences into families. The network was trained with Kohonen's unsupervised learning algorithm using, as inputs, the matrix patterns derived from the dipeptide composition of the proteins. We present here a large-scale application of that method to classify the 1,758 human protein sequences stored in the SwissProt database (release 19.0), whose lengths are greater than 50 amino acids. In the final 2-dimensional topologically ordered map of 15 x 15 neurons, proteins belonging to known families were associated with the same neuron or with neighboring ones. Also, as an attempt to reduce the time-consuming learning procedure, we compared 2 learning protocols: one of 500 epochs (100 SUN CPU-hours [CPU-h]), and another one of 30 epochs (6.7 CPU-h). A further reduction of learning-computing time, by a factor of about 3.3, with similar protein clustering results, was achieved using a matrix of 11 x 11 components to represent the sequences. Although network training is time consuming, the classification of a new protein in the final ordered map is very fast (14.6 CPU-seconds). We also show a comparison between the artificial neural network approach and conventional methods of biosequence analysis.

Read full abstract

Novel methods are discussed for using fast Fourier transforms for DNA or protein sequence comparison. These methods are also intended as a contribution to the more general computer science problem of text search. These methods extend the capabilities of previous FFT methods and show that these methods are capable of considerable refinement. In particular, novel methods are given which (1) enable the detection of clusters of matching letters, (2) facilitate the insertion of gaps to enhance sequence similarity, and (3) accommodate to varying densities of letters in the input sequences. These methods use Fourier analysis in two distinct ways. (1) Fast Fourier transforms are used to facilitate rapid computation. (2) Fourier expansions are used to form an 'image' of the sequence comparison.

Read full abstract

Biosequence Analysis Research Articles

Articles published on Biosequence Analysis

Self‐organized neural maps of human protein sequences

Fourier methods for biosequence analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Biosequence Analysis Research Articles

Articles published on Biosequence Analysis

Self‐organized neural maps of human protein sequences

Fourier methods for biosequence analysis