Abstract
BackgroundMultiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids. However, large data sets are no longer manageable for visualization and investigation using the traditional stacked sequence alignment representation.ResultsWe introduce ProfileGrids that represent a multiple sequence alignment as a matrix color-coded according to the residue frequency occurring at each column position. JProfileGrid is a Java application for computing and analyzing ProfileGrids. A dynamic interaction with the alignment information is achieved by changing the ProfileGrid color scheme, by extracting sequence subsets at selected residues of interest, and by relating alignment information to residue physical properties. Conserved family motifs can be identified by the overlay of similarity plot calculations on a ProfileGrid. Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program.We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family – a universally conserved protein involved in DNA recombination and repair. Careful attention was paid to curating the collected RecA sequences since ProfileGrids allow the easy identification of rare residues in an alignment. We relate the RecA alignment sequence conservation to the following three topics: the recently identified DNA binding residues, the unexplored MAW motif, and a unique Bacillus subtilis RecA homolog sequence feature.ConclusionProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form. This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display of rare sequence features within the context of an entire alignment. JProfileGrid is free for non-commercial use and is available from . Furthermore, we present a curated RecA protein collection that is more diverse than previous data sets; and, therefore, this RecA ProfileGrid is a rich source of information for nanoanatomy analysis.
Highlights
Multiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids
JProfileGrid is free for noncommercial use and is available from http://www.profilegrid.org
We present a curated RecA protein collection that is more diverse than previous data sets; and, this RecA ProfileGrid is a rich source of information for nanoanatomy analysis
Summary
Multiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids. MSA formatting programs facilitated analysis by emphasizing residues with boxes, colors, and shading [1,2,3]. These programs (and many subsequent different implementations) still represent a MSA as stacked sequences. A graphical view of MSA conservation can be achieved with an "overview" mode [7,8] or with plots of similarity values [9] All of these representations do not convey the details of each character's frequency distribution at each homologous position in the entire alignment. There is a need for a new visual representation paradigm for MSAs
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.