Non-B DNA G-quadruplex (G4) structures with guanine (G) runs of 2-4 repeats can trigger opposing experimental transcriptional impacts. Here, we employed bioinformatic algorithms to comprehensively assess correlations of steady-state RNA transcript levels with all putative G4 sequence (pG4) locations genome-wide in three mammalian genomes and in normal and tumor human tissues. The human pG4-containing gene set displays higher expression levels than the set without pG4, supporting and extending some prior observations. pG4 enrichment at transcription start sites (TSS) in human, but not chimpanzee and mouse genomes, suggests possible positive selection pressure for pG4 at human TSS, potentially driving genome rewiring and gene expression divergence between human and chimpanzee. Comprehensive bioinformatic analyses revealed lower pG4-containing gene set variability in humans and among different pG4 genes in tumors. As G4 stabilizers are under therapeutic consideration for cancer and pathogens, such distinctions between human normal and tumor G4s along with other species merit attention. Furthermore, in germline and cancer sequences, the most mutagenic pG4 mapped to regions promoting alternative DNA structures. Overall findings establish high pG4 at TSS as a human genome attribute statistically associated with robust well-coordinated transcription and reduced cancer transcriptome variation with implications for biology, model organisms, and medicine.
Read full abstract