FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies.

Patrick Kück,Gary C Longo

doi:10.1186/s12983-014-0081-x

Abstract

BackgroundPhylogenetic and population genetic studies often deal with multiple sequence alignments that require manipulation or processing steps such as sequence concatenation, sequence renaming, sequence translation or consensus sequence generation. In recent years phylogenetic data sets have expanded from single genes to genome wide markers comprising hundreds to thousands of loci. Processing of these large phylogenomic data sets is impracticable without using automated process pipelines. Currently no stand-alone or pipeline compatible program exists that offers a broad range of manipulation and processing steps for multiple sequence alignments in a single process run.ResultsHere we present FASconCAT-G, a system independent editor, which offers various processing options for multiple sequence alignments. The software provides a wide range of possibilities to edit and concatenate multiple nucleotide, amino acid, and structure sequence alignment files for phylogenetic and population genetic purposes. The main options include sequence renaming, file format conversion, sequence translation between nucleotide and amino acid states, consensus generation of specific sequence blocks, sequence concatenation, model selection of amino acid replacement with ProtTest, two types of RY coding as well as site exclusions and extraction of parsimony informative sites. Convieniently, most options can be invoked in combination and performed during a single process run. Additionally, FASconCAT-G prints useful information regarding alignment characteristics and editing processes such as base compositions of single in- and outfiles, sequence areas in a concatenated supermatrix, as well as paired stem and loop regions in secondary structure sequence strings.ConclusionsFASconCAT-G is a command-line driven Perl program that delivers computationally fast and user-friendly processing of multiple sequence alignments for phylogenetic and population genetic applications and is well suited for incorporation into analysis pipelines.Electronic supplementary materialThe online version of this article (doi:10.1186/s12983-014-0081-x) contains supplementary material, which is available to authorized users.

Highlights

Phylogenetic and population genetic analyses commonly involve the manipulation and processing of multiple sequence alignments
Recent studies searching for genes potentially under selection among populations relied on identifying the most common allele at polymorphic sites as well as alleles fixed within populations [38], which can be accomplished through consensus generation
With FASconCAT-G (FcC-G), we introduce a versatile software designed for processing and manipulating multiple sequence alignments

Summary

Introduction

Phylogenetic and population genetic analyses commonly involve the manipulation and processing of multiple sequence alignments. [30,31,32,33]) Another common analysis of multiple sequence alignments is consensus sequence generation, which is commonly used to identify and compare conserved and variable regions Phylogenetic and population genetic studies often deal with multiple sequence alignments that require manipulation or processing steps such as sequence concatenation, sequence renaming, sequence translation or consensus sequence generation. In recent years phylogenetic data sets have expanded from single genes to genome wide markers comprising hundreds to thousands of loci. Processing of these large phylogenomic data sets is impracticable without using automated process pipelines. No stand-alone or pipeline compatible program exists that offers a broad range of manipulation and processing steps for multiple sequence alignments in a single process run

Methods

Results

Conclusion