Abstract

GC and TA skew are fundamental genomic features observed widely from bacteria to humans. They have been described as replication- and transcription-coupled skews because they are mainly caused by differences in mutation and repair biases between complementary DNA strands in replication and transcription processes. The skews are clearly observed in bacteria, plants and birds, but are more difficult to accurately identify in mammals, including humans. The main reason for this difficulty is that the correlation between GC and TA skew is not high for their variation in the mammalian genomes. In this study, we focused not only on these mononucleotide skews but also on di- and trinucleotide skews, first in 100-kb sequences of two short human chromosomes (chr21 and 22), and by using the conventional distribution map analysis, we clearly observed these skews over these chromosomes, except for centromeric heterochromatin regions where diverse repetitive sequences are densely clustered. Then, by using a batch learning self-organizing map (BLSOM), which can display large amounts of sequence data at once, we detected the oligonucleotide skews across nearly the entire human genome. Next, to understand skews within centromeric heterochromatin regions, we focused on Alu sequences ubiquitously distributed in the human genome and analyzed mono-, di-, and trinucleotide skews by referring to the polarity of Alu sequences. Since this polarity is unevenly distributed in the human genome, a clear TA skew was observed by reflecting the evident A-richness to T of Alu elements. However, due to some degree of their G-richness to C, TA and GC skews showed inversely correlated variations, when analyzing Alu sequences alone. Notably, when we analyzed genome sequences after excluding Alu sequences, the concordant correlation between GC- and TA-skew variations increased, indicating that a large amount of Alu sequences has made the detection of these mononucleotide skews in the human genome more difficult than in other organisms. Analysis of oligonucleotide skews using BLSOM is thought to be suitable for characterizing sequence features in skew transition regions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call