Abstract

The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of Cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes.

Highlights

  • Genome growth and innovation is largely attributed to the expansion of gene superfamilies [1,2,3]

  • Synthesized members in a family have been commonly recognized to experience either pseudogenization or fixation in a genome through four processes: 1) conservation if gene dosage is beneficial [1], 2) neofunctionalization if a novel function is acquired in one copy [1], 3) subfunctionalization if daughter copies split the function of the ancestral copy [7], or 4) specialization if all daughter copies perform new functions differing from their ancient copy [8]

  • Transcription factors tend to have a much higher fixation rate than others, such as G-protein-coupled chemosensory receptors for finding food, odorants and pheromones and the immunoglobulins involving in the primary immune defense in vertebrates, and the nucleotide-binding site-leucine-rich repeat (NBS-LRR) receptors whose functions are largely found in defending pathogens in plants [11,12,13,14,15,16]

Read more

Summary

Introduction

Genome growth and innovation is largely attributed to the expansion of gene superfamilies [1,2,3]. Through both whole-genome duplication (WGD) and small-scale duplication (SSD) events, members of a gene family can increase dramatically [4,5,6]. The significant contribution in genome evolution and the striking variance of evolutionary processes of members within and between families place the studies of gene duplication an important role in decoding the function of genomes. Precisely finding the total members of a gene family in a genome would be the first effort

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.