Abstract

Comparative genomics is the analysis of genomic relationships among different species and serves as a significant base for evolutionary and functional genomic studies. GreenPhylDB (https://www.greenphyl.org) is a database designed to facilitate the exploration of gene families and homologous relationships among plant genomes, including staple crops critically important for global food security. GreenPhylDB is available since 2007, after the release of the Arabidopsis thaliana and Oryza sativa genomes and has undergone multiple releases. With the number of plant genomes currently available, it becomes challenging to select a single reference for comparative genomics studies but there is still a lack of databases taking advantage several genomes by species for orthology detection. GreenPhylDBv5 introduces the concept of comparative pangenomics by harnessing multiple genome sequences by species. We created 19 pangenes and processed them with other species still relying on one genome. In total, 46 plant species were considered to build gene families and predict their homologous relationships through phylogenetic-based analyses. In addition, since the previous publication, we rejuvenated the website and included a new set of original tools including protein-domain combination, tree topologies searches and a section for users to store their own results in order to support community curation efforts.

Highlights

  • Plant comparative genomics resources usually compare reference genomes to compute homology sequences and en-C The Author(s) 2020

  • D1468 Nucleic Acids Research, 2021, Vol 49, Database issue ularly helpful when searching for transcription factors for which sequences must contain some domains but not others [35] as the Markov Cluster Algorithm (MCL) may fail grouping them accurately

  • Use the homologous sequence search directly. This new version of GreenPhylDB provides a unique way to scale up plant comparative genomics studies across multiple plants species by leveraging pangenomic datasets

Read more

Summary

Introduction

Plant comparative genomics resources usually compare reference genomes to compute homology sequences and en-C The Author(s) 2020. For single-gene copy clusters (a single sequence per genome), protein sequences were aligned using MAFFT v7.313 [22] (parameters adjusted according to the number of sequences) and an automatic procedure to generate a consensus sequence was applied (Figure 1A). Singletons (cluster of one sequence) were searched for similarity using DIAMOND [23] with a default e-value on the protein-coding genes of all other species genomes to predict their putative prediction accuracy.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call