Abstract

The introduction of two-dimension (2D) graphs and their numerical characterization for comparative analyses of DNA/RNA and protein sequences without the need of sequence alignments is an active yet recent research topic in bioinformatics. Here, we used a 2D artificial representation (four-color maps) with a simple numerical characterization through topological indices (TIs) to aid the discovering of remote homologous of Adenylation domains (A-domains) from the Nonribosomal Peptide Synthetases (NRPS) class in the proteome of the cyanobacteria Microcystis aeruginosa. Cyanobacteria are a rich source of structurally diverse oligopeptides that are predominantly synthesized by NPRS. Several A-domains share amino acid identities lower than 20 % being a possible source of remote homologous. Therefore, A-domains cannot be easily retrieved by BLASTp searches using a single template. To cope with the sequence diversity of the A-domains we have combined homology-search methods with an alignment-free tool that uses protein four-color-maps. TI2BioP (Topological Indices to BioPolymers) version 2.0, available at http://ti2biop.sourceforge.net/ allowed the calculation of simple TIs from the protein sequences (four-color maps). Such TIs were used as input predictors for the statistical estimations required to build the alignment-free models. We concluded that the use of graphical/numerical approaches in cooperation with other sequence search methods, like multi-templates BLASTp and profile HMM, can give the most complete exploration of the repertoire of highly diverse protein families.

Highlights

  • The Chemical Graph Theory (CGT) consists in the application of the graph theory to perform combinatorial and topological exploration of the chemical molecular structure

  • Cyanobacteria are a rich source of structurally diverse oligopeptides that are predominantly synthesized by Nonribosomal Peptide Synthetases (NRPS)

  • We concluded that the use of an ensemble of sequence search methods can give the best exploration of the repertoire of highly diverse protein classes, such as the NRPS represented by its A-domains

Read more

Summary

Introduction

The Chemical Graph Theory (CGT) consists in the application of the graph theory to perform combinatorial and topological exploration of the chemical molecular structure. Examples of 2D artificial representations of DNA and protein sequences with potentialities in bioinformatics include the spectrum-like, star-like, cartesian-type and four-color maps [1,2,3,4,5] These DNA/RNA and protein maps can generally unravel higher-order useful information contained beyond the primary structure, i.e. nucleotide/ amino acid distribution into a 2D space. The use of the four-color maps and its numerical characterization can cooperate with traditional homology search tools (e.g. BLAST, HMMs) to carry out an exhaustive exploration of functional signatures in highly diverse gene/protein families. Such exploration is effective when all family members are retrieved including remote homologs. The new Adomain variants found in the proteome of Microcystis aeruginosa could unravel the presence of novel NRPS clusters

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call