Abstract

Several strongly conserved DNA sequence patterns in and between introns and intergenic regions (IIRs) consisting of short tandem repeats (STRs) with repeat lengths <3 bp have already been described in the kingdom of Animalia. In this work, we expanded the search and analysis of conserved DNA sequence patterns to a wider range of eukaryotic genomes. Our aims were to confirm the conservation of these patterns, to support the hypothesis on their functional constraints and/or the identification of unknown patterns. We pairwise compared genomic DNA sequences of genes, exons, CDS, introns and intergenic regions of 34 Embryophyta (land plants), 30 Protista and 29 Fungi using established k-mer-based (alignment-free) comparison methods. Additionally, the results were compared with values derived for Animalia in former studies. We confirmed strong correlations between the sequence structures of IIRs spanning over the entire domain of Eukaryotes. We found that the high correlations within introns, intergenic regions and between the two are a result of conserved abundancies of STRs with repeat units ≤2 bp (e.g., (AT)n). For some sequence patterns and their inverse complementary sequences, we found a violation of equal distribution on complementary DNA strands in a subset of genomes. Looking at mismatches within the identified STR patterns, we found specific preferences for certain nucleotides stable over all four phylogenetic kingdoms. We conclude that all of these conserved patterns between IIRs indicate a shared function of these sequence structures related to STRs.

Highlights

  • Genome regions encoding for the chemical structures of proteins, such as genes, exons or CDS, are known to harbor functional sequence structures conserved within a wide phylogenetic range [1]

  • Several strongly conserved DNA sequence patterns in and between introns and intergenic regions (IIRs) consisting of short tandem repeats (STRs) with repeat lengths

  • We found that the high correlations within introns, intergenic regions and between the two are a result of conserved abundancies of STRs with repeat units ≤2 bp (e.g., (AT)n)

Read more

Summary

Introduction

Genome regions encoding for the chemical structures of proteins, such as genes, exons or CDS (coding DNA sequences), are known to harbor functional sequence structures (amino acid codons) conserved within a wide phylogenetic range [1]. While the remaining "non-coding" regions (introns and intergenic regions (IIRs)) were initially declared as useless “junk” DNA [2,3], the existence and importance of conserved sequence structures in IIRs became clearer and clearer in the last decades [4,5]. More recent studies found conserved structures comparing different regions, e.g., between introns and intergenic regions [8]. While the genomes of the Animalia species were analyzed in [8], the aim of this study was to search for conserved sequence structures within IIRs of Embryophyta, Protista and Fungi and to compare the results between the kingdoms (including Animalia). Standard sequence analysis tools, such as the NCBI Basic Local Alignment Search Tool (BLAST) [9], cannot be effectively used to search such structures within regions of sizes comparable to entire genomes [10]. We used a simplistic but powerful method called k-mer analysis [12] designed for this special task [12]

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.