Abstract

The currently-accepted dogma when analysing human Alu transposable elements is that ‘young’ Alu elements are found in low GC regions and ‘old’ Alus in high GC regions. The correlation between high GC regions and high gene frequency regions make this observation particularly difficult to explain. Although a number of studies have tackled the problem, no analysis has definitively explained the reason for this trend. These observations have been made by relying on the subfamily as a proxy for age of an element. In this study, we suggest that this is a misleading assumption and instead analyse the relationship between the taxonomic distribution of an individual element and its surrounding GC environment. An analysis of 103906 Alu elements across 6 human chromosomes was carried out, using the presence of orthologous Alu elements in other primate species as a proxy for age. We show that the previously-reported effect of GC content correlating with subfamily age is not reflected by the ages of the individual elements. Instead, elements are preferentially lost from areas of high GC content over time. The correlation between GC content and subfamily may be due to a change in insertion bias in the young subfamilies. The link between Alu subfamily age and GC region was made due to an over-simplification of the data and is incorrect. We suggest that use of subfamilies as a proxy for age is inappropriate and that the analysis of ortholog presence in other primate species provides a deeper insight into the data.

Highlights

  • A small proportion of our genome is made up of sequences that code for proteins (Lander et al, 2001)

  • Were the Alu elements that have spread to fixation in human populations spread by a selective advantage that they conveyed? Polymorphisms for Alu insertions have been much studied as a tool in human population genetic inference, because the absence of the element can always be identified as the ancestral state (Batzer & Deininger, 2002)

  • In this study we investigate the relationship between the flanking GC content of an Alu element and its presence in modern primate species, assuming that elements found in multiple organisms at the same chromosomal location were inserted into a common ancestor species (Hellen & Brookfield, 2013)

Read more

Summary

Introduction

A small proportion of our genome is made up of sequences that code for proteins (Lander et al, 2001). Many functional DNA sequences have been derived from Alu sequences-Alus have contributed to the control of transcription by supplying transcription factor-binding regions (Laperriere et al, 2007; Polak & Domany, 2006; Cowley & Oakey, 2013) and they are involved in alternative splicing (Li et al, 2001; Nekrutenko & Li, 2001), and in supplying transcriptional start sites for antisense transcripts used in gene regulation (Conley, Miller & Jordan, 2008) These properties, of some Alu sequences, are unlikely to be neutral in their selective effects. In principle, these could be weakly harmful and yet could have spread to fixation by drift in small ancestral populations. Almost all polymorphic Alu sites are in non-coding regions, supporting a model where Alu insertions into functional sequences are selectively harmful and rapidly eliminated, and the remaining insertions are in non-functional regions, and their spread through populations occurs by genetic drift, not selection

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call