Abstract

A functional Non-Tandem Duplicated Cluster (FNTDC) is a group of non-tandem-duplicated genes that are located closer than expected by mere chance and have a role in the same biological function. The identification of secondary-compounds–related FNTDC has gained increased interest in recent years, but little ab-initio attempts aiming to the identification of FNTDCs covering all biological functions, including primary metabolism compounds, have been carried out. We report an extensive FNTDC dataset accompanied by a detailed assessment on parameters used for genome scanning and their impact on FNTDC detection. We propose 70% identity and 70% alignment coverage as intermediate settings to exclude tandem duplicated genes and a dynamic scanning window of 24 genes. These settings were applied to rice, arabidopsis and grapevine genomes to call for FNTDCs. Besides the best-known secondary metabolism clusters, we identified many FNTDCs associated to primary metabolism ranging from macromolecules synthesis/editing, TOR signalling, ubiquitination, proton and electron transfer complexes. Using the intermediate FNTDC setting parameters (at P-value 1e-6), 130, 70 and 140 candidate FNTDCs were called in rice, arabidopsis and grapevine, respectively, and 20 to 30% of GO tags associated to called FNTDC were common among the 3 genomes. The datasets developed along with this work provide a rich framework for pinpointing candidate FNTDCs reflecting all GO-BP tags covering both primary and secondary metabolism with large macromolecular complexes/metabolons as the most represented FNTDCs. Noteworthy, several FNTDCs are tagged with GOs referring to organelle-targeted multi-enzyme complex, a finding that suggest the migration of endosymbiont gene chunks towards nuclei could be at the basis of these class of candidate FNTDCs. Most FNTDC appear to have evolved prior of genome duplication events. More than one-third of genes interspersed/adjacent to called FNTDCs lacked any functional annotation; however, their co-localization may provide hints towards a candidate biological role.

Highlights

  • Functional Non-Tandem Duplicated Clusters (FNTDCs) are groups of non-tandem-duplicated genes having a role in the same biological function located on the genome closer than expected by mere chance [1,2,3,4,5]

  • We report the FNTDC scanning for rice, grape and arabidopsis, three phylogenetically distinct plant species all endowed with very high-quality genomes and plenty of sequence information, probably the best examples in their respective categories

  • Despite the strong interest in secondary compounds clusters, which has prompted the recent development of several dedicated tools, much less information on functional cluster covering all biological functions, including primary metabolism compounds, is available

Read more

Summary

Introduction

Functional Non-Tandem Duplicated Clusters (FNTDCs) are groups of non-tandem-duplicated genes having a role in the same biological function located on the genome closer than expected by mere chance [1,2,3,4,5]. This interest has prompted the development of comprehensive bioinformatic tools (e.g. plantismash [5, 7], Phytoclust [8]) for thorough analyses based on libraries of profile Hidden Markov Models (pHMMs) of enzyme gene families involved in plant biosynthetic pathways These tools are conceived to target biosynthetic gene cluster for secondary metabolism and fail to detect the entirety of FNTDC in a broad sense, i.e. clusters of non-tandem genes which share any biological function as specified by the GO biological process tags. Hints for several primary metabolic clusters have been obtained in further searches which were, again, solely based on enzyme classified on predicted catalytic functions, despite covering larger enzyme classes [1 and references therein], [9]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call