Abstract

The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.

Highlights

  • The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes

  • Previous studies on overlapping genes have encountered major challenges regarding the annotation of natural antisense transcripts (NATs)[7]

  • On the basis of a simple criterion based on shared/overlapped genomic regions, we found 4,951 human protein-coding genes to overlap in terms of their physical gene boundaries (Supplementary Table 1)

Read more

Summary

Introduction

The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Information, researchers can accurately determine protein-coding gene structures and boundaries as well as their isoform expression profiles[3] This information can enable researchers to obtain clear and updated information on overlapping protein-coding genes in the human genome. Nakayama et al.[11] reported that same-strand overlap events are more common than opposite-strand overlap events, whereas Sanna et al.[6] indicated that different-strand overlapping genes are the major type in the human genome This points out that careful annotations and utilisations of DNA loci and RNA transcript information on all protein-coding genes might be crucial in cautiously defining overlapped gene pairs and subsequent analyses[12].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.