Abstract

Introduction Pseudogenes are copies of the ancestral genes which have underwent changes that were constructed based on gene duplications and reverse transcription in the genome. They have been reported in all types of organisms ranging from bacteria to mammals. Pseudogenes increase the genetic diversity of a plethora of genes and they do so through gene conversion and recombination. Three classes of pseudogenes are known to exist: duplicated pseudogenes; processed or retrotransposed pseudogenes; and unitary or disabled pseudogenes. Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent studies reported that many of them might have some form of biological activity. Recently, it has reported that pseudogenes represent a conspicuous part of the human transcriptome and proteome, as thousands of them are transcribed and hundreds are also translated. Also, it has been demonstrated that pseudogenes exert important coding-dependent and coding-independent functions that are involved in complex regulatory networks. Hence, the possibility of functionality of these genes, has increased interest in their accurate annotation. According to the best of our knowledge, there is no available report on the high-throughput pseudogene identification in sheep. Therefore, in the present study, to improve the annotation of sheep genome, we present the first genome-wide pseudogene identification for protein-coding genes using a homology-based computational approach. Materials and Methods We estimate the pseudogene content in the sheep genome using an in-house computational annotation pipeline, named PseudoPipe. The PseudoPipe pipeline predicts pseudogenes in the genome using homology-based method (BLAST and a clustering algorithm). In the present study, repeat-masked sheep genome reference (Ovis_aries.Oar_v3.1), genome annotation gtf file (version 77) and all of the protein coding genes sequences were downloaded from ENSEMBL database. To identify pseudogenes, we searched the sheep genome in a comprehensive and consistent manner. The key steps in the pipeline involved using BLAST to rapidly cross-reference potential ‘‘parent’’ proteins against the intergenic regions of the genome and then processing the resulting ‘‘raw hits’’ such as eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Then, pseudogenes were classified based on a combination of criteria including homology, intron/exon structure, and existence of stop codons and frameshifts. Finally, we investigated the results manually and false positive results were removed. Also, the gene ontology (GO) of the parental genes that pseudogenes derived from them, have been investigate by DAVID software. Furthermore, different characteristics of the identified new candidate pseudogene were compared with known pseudogenes in the human, mice and cattle species. Results and discussion It is vital to identify pseudogenes to better understand genome annotation and disease-related molecular mechanism. Identification of pseudogenes is an ongoing effort, and there are several groups continuously working on identification of pseudogenes. The complexity of the identification of pseudogenes can be addressed by in silico analysis and using a homology-based whole genome identification approach. Here, using a computational method, we identified 4,098 high confidence pseudogenes including 1,102 duplicated and 2,996 processed pseudogenes in sheep genome. The results of the GO analysis showed that identified pseudogenes are significantly enriched in various biological processes, such as mRNA splicing, ribosome structure, binding rRNA, mitochondrial electron transport, translation and etc. Interestingly, a growing body of evidence suggests parental genes of pseudogenes roles are associated with ribosome, rRNA and translational biological processes. Detailed comparison of our results with other species showed that our results are in consistence with previous studies. For example, pseudogene distribution on the sheep chromosomes was in consistence with human and mouse genome. Moreover, it is reported that, duplicated pseudogenes are commonly found on the same chromosome as their parent genes. Our results showed that about 28% of the identified duplicated pseudogenes were on the same chromosome as their parent genes. The results of the study will help to improve the annotation of the sheep genome. The coincidence of the results of this study with previous studies indicates accuracy of the method used in this research. Conclusion This study, for the first time, has generated the catalog of 4,098 sheep putative pseudogenes. Our findings provide an evidence for pseudogene content in sheep which is a starting point for understanding of their regulatory mechanism. The identification of the novel pseudogenes have greatly improved the genome annotation of sheep. The results of this study will help to better annotation of sheep genome. By using such methods, we can also improve annotation genomes of various organisms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call