Abstract

BackgroundFrameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. The former does not allow to account for frameshift translations and up to now, the latter exclusively accounts for frameshift translation initiation, not considering the length of the translation disruption caused by a frameshift.ResultsWe introduce a new scoring scheme with an algorithm for the pairwise alignment of CDS accounting for frameshift translation initiation and length, while simultaneously considering nucleotide and amino acid sequences. The main specificity of the scoring scheme is the introduction of a penalty cost accounting for frameshift extension length to compute an adequate similarity score for a CDS alignment. The second specificity of the model is that the search space of the problem solved is the set of all feasible alignments between two CDS. Previous approaches have considered restricted search space or additional constraints on the decomposition of an alignment into length-3 sub-alignments. The algorithm described in this paper has the same asymptotic time complexity as the classical Needleman–Wunsch algorithm.ConclusionsWe compare the method to other CDS alignment methods based on an application to the comparison of pairs of CDS from homologous human, mouse and cow genes of ten mammalian gene families from the Ensembl-Compara database. The results show that our method is particularly robust to parameter changes as compared to existing methods. It also appears to be a good compromise, performing well both in the presence and absence of frameshift translations. An implementation of the method is available at https://github.com/UdeS-CoBIUS/FsePSA.

Highlights

  • Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions

  • A classical approach for aligning two CDS used in most alignment tools [14, 15] consists in a three-step method, where the CDS are first translated into amino acid (AA) sequences using their actual coding frame, AA sequences are aligned, and the AA alignment is back-translated to a CDS alignment

  • Data We evaluated the algorithm through applications on a mammalian dataset containing CDS sequences from ten gene families obtained from the database EnsemblCompara version 83 [26]

Read more

Summary

Introduction

Frameshift translation is an important phenomenon that contributes to the appearance of novel coding DNA sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of gene coding regions. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. FS translation consists in alternative AA translations of a coding region of DNA using different translation frames [5] It is an important phenomenon resulting from different scenarios such as, insertion or deletion of a Jammali et al Algorithms Mol Biol (2017) 12:10 nucleotide sequence whose length is not a multiple of 3 in a CDS through alternative splicing [6, 7] or evolutionary genomic indels [8, 9], programmed ribosomal frameshifting [10], or sequencing errors [11]. The opposite problem of aligning protein sequences while recovering their hypothetical nucleotide CDS sequences and accounting for FS translation was studied in several papers [16, 17]

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.