Abstract

BackgroundThe human genome encodes over 14,000 pseudogenes that are evolutionary relics of protein-coding genes and commonly considered as nonfunctional. Emerging evidence suggests that some pseudogenes may exert important functions. However, to what extent human pseudogenes are functionally relevant remains unclear. There has been no large-scale characterization of pseudogene function because of technical challenges, including high sequence similarity between pseudogene and parent genes, and poor annotation of transcription start sites.ResultsTo overcome these technical obstacles, we develop an integrated computational pipeline to design the first genome-wide library of CRISPR interference (CRISPRi) single-guide RNAs (sgRNAs) that target human pseudogene promoter-proximal regions. We perform the first pseudogene-focused CRISPRi screen in luminal A breast cancer cells and reveal approximately 70 pseudogenes that affect breast cancer cell fitness. Among the top hits, we identify a cancer-testis unitary pseudogene, MGAT4EP, that is predominantly localized in the nucleus and interacts with FOXA1, a key regulator in luminal A breast cancer. By enhancing the promoter binding of FOXA1, MGAT4EP upregulates the expression of oncogenic transcription factor FOXM1. Integrative analyses of multi-omic data from the Cancer Genome Atlas (TCGA) reveal many unitary pseudogenes whose expressions are significantly dysregulated and/or associated with overall/relapse-free survival of patients in diverse cancer types.ConclusionsOur study represents the first large-scale study characterizing pseudogene function. Our findings suggest the importance of nuclear function of unitary pseudogenes and underscore their underappreciated roles in human diseases. The functional genomic resources developed here will greatly facilitate the study of human pseudogene function.

Highlights

  • Pseudogenes are defined as dysfunctional copies of protein-coding genes that have lost their coding potential due to the accumulation of disruptive mutations such as premature stop codons and frame-shift insertions/deletions [1, 2]

  • Validating top pseudogene hits with an upregulated expression in breast cancer To validate the top pseudogene hits from our screen that are relevant to breast cancer, we focused on the pseudogenes, whose targeting single-guide RNAs (sgRNAs) showed the strongest growth inhibitory effect in MCF7 cells and that were significantly upregulated in breast cancer (Fig. 3A), compared with normal breast tissues (log2Fold-Change≥log2(1.5) and FDR < 0.05, “Methods”)

  • Consistent with the results obtained by CRISPR interference (CRISPRi) method, we found that the effective small interference RNA (siRNA)-mediated depletion of MGAT4EP (Additional File 2: Fig. S3F) inhibited the growth of both MCF7 and T47D, the two independent luminal A breast cancer cell lines (Additional File 2: Fig. S3G)

Read more

Summary

Introduction

Pseudogenes are defined as dysfunctional copies of protein-coding genes that have lost their coding potential due to the accumulation of disruptive mutations such as premature stop codons and frame-shift insertions/deletions [1, 2]. Pseudogenes are evolutionary relics present in the genomes of a wide variety of species, including bacteria, plant, and metazoans [3, 4]. They are often lineage-specific throughout the evolution, and mammalian genomes contain much more pseudogenes than other metazoan species [4]. Based on their generation mechanism during the course of evolution, pseudogenes can be categorized into three major classes: (1) unprocessed ( referred to as duplicated) pseudogenes, derived from duplication of protein-coding genes; (2) processed pseudogenes, generated by retrotransposition of mRNA transcribed from protein-coding genes back into the genome; and (3) unitary pseudogenes, which arise through mutations in previously functional protein-coding genes without gene duplication. There has been no large-scale characterization of pseudogene function because of technical challenges, including high sequence similarity between pseudogene and parent genes, and poor annotation of transcription start sites

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call