The rate and extent of unbalanced eukaryotic intron changes exhibit dynamic patterns for different lineages of species or certain functional groups of genes with varied spatio-temporal expression modes affected by selective pressure. To date, only a few key conserved splicing signals or regulatory elements have been identified in introns and little is known about the remaining intronic regions. To trace the evolutionary trajectory of spliceosomal introns from available genomes under a unified framework, we present IntronDB, which catalogs ∼50000000 introns from over 1000 genomes spanning the major eukaryotic clades in the tree of life. Based on the position of introns relative to coding regions, it categorizes introns into three groups, such as 5'UTR, CDS and 3'UTR and subsequently divides CDS introns into three categories, such as phase 0, phase 1 and phase 2. It provides the quality evaluation for each sequence entry and characterizes the intronic parameters including number, size, sequence composition and positioning information as well as the features for exons and genes, making possible the comparisons between introns and exons. It reports the dinucleotides around the intron boundary and displays the consensus sequence features for all introns, small introns and large introns for each genome. By incorporating the taxonomic assignment of genomes, it performs high-level or genome-wide statistical analysis for single feature and coupled features both in a single genome and across multiple genomes. It offers functionalities to browse the data from representative protein-coding transcripts and download the data from all transcripts from protein-coding genes. http://www.nextgenbioinformatics.org/IntronDB. Supplementary data are available at Bioinformatics online.
Read full abstract