Abstract

Previous studies have shown that Mixed Lineage Leukemia 1 (MLL1 or MLL) binds a group of CpG-rich motifs known as morphemes. To examine whether occurrences of MLL1 morphemes in genomic DNA may influence codon utilization, we analyzed the frequency of various 9-mers in human cDNAs and in total human genomic DNA. We uncovered preferential utilization of GGC for Gly, GCG for Ala, CCG for Pro, and TCG for Ser, in coding sequences (CDSs) that included MLL1 morphemes. We also examined weighted occurrences of CDS 9-mers in a 30-base window that moved along each human chromosome. In plots, we observed peaks with fluctuating intensities. High intensity peaks appeared within promoter and exons localized in CpG islands, encompassing sequences that included MLL1 morphemes. High intensity peaks included CCG/GGC repeats, whose expansion may cause neurological disorders and congenital malformations. Such repeats are generated from overlap of a morpheme (CGCCG/CGGCG), which depending on reading frame and orientation would produce runs of Ala, Gly, or Pro in proteins. Overall, our results point to a role for morpheme occurrences on synonymous codon utilization in human genomic DNA and indicate that regulatory instructions are dispersed not only in promoters but also in exons of human genes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call