Abstract

Transposable elements (TEs) make up a half of the human genome, but the extent of their contribution to cryptic exon activation that results in genetic disease is unknown. Here, a comprehensive survey of 78 mutation-induced cryptic exons previously identified in 51 disease genes revealed the presence of TEs in 40 cases (51%). Most TE-containing exons were derived from short interspersed nuclear elements (SINEs), with Alus and mammalian interspersed repeats (MIRs) covering >18 and >16% of the exonized sequences, respectively. The majority of SINE-derived cryptic exons had splice sites at the same positions of the Alu/MIR consensus as existing SINE exons and their inclusion in the mRNA was facilitated by phylogenetically conserved changes that improved both traditional and auxiliary splicing signals, thus marking intronic TEs amenable for pathogenic exonization. The overrepresentation of MIRs among TE exons is likely to result from their high average exon inclusion levels, which reflect their strong splice sites, a lack of splicing silencers and a high density of enhancers, particularly (G)AA(G) motifs. These elements were markedly depleted in antisense Alu exons, had the most prominent position on the exon-intron gradient scale and are proposed to promote exon definition through enhanced tertiary RNA interactions involving unpaired (di)adenosines. The identification of common mechanisms by which the most dynamic parts of the genome contribute both to new exon creation and genetic disease will facilitate detection of intronic mutations and the development of computational tools that predict TE hot-spots of cryptic exon activation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call