A functional Non-Tandem Duplicated Cluster (FNTDC) is a group of non-tandem-duplicated genes that are located closer than expected by mere chance and have a role in the same biological function. The identification of secondary-compounds–related FNTDC has gained increased interest in recent years, but little ab-initio attempts aiming to the identification of FNTDCs covering all biological functions, including primary metabolism compounds, have been carried out. We report an extensive FNTDC dataset accompanied by a detailed assessment on parameters used for genome scanning and their impact on FNTDC detection. We propose 70% identity and 70% alignment coverage as intermediate settings to exclude tandem duplicated genes and a dynamic scanning window of 24 genes. These settings were applied to rice, arabidopsis and grapevine genomes to call for FNTDCs. Besides the best-known secondary metabolism clusters, we identified many FNTDCs associated to primary metabolism ranging from macromolecules synthesis/editing, TOR signalling, ubiquitination, proton and electron transfer complexes. Using the intermediate FNTDC setting parameters (at P-value 1e-6), 130, 70 and 140 candidate FNTDCs were called in rice, arabidopsis and grapevine, respectively, and 20 to 30% of GO tags associated to called FNTDC were common among the 3 genomes. The datasets developed along with this work provide a rich framework for pinpointing candidate FNTDCs reflecting all GO-BP tags covering both primary and secondary metabolism with large macromolecular complexes/metabolons as the most represented FNTDCs. Noteworthy, several FNTDCs are tagged with GOs referring to organelle-targeted multi-enzyme complex, a finding that suggest the migration of endosymbiont gene chunks towards nuclei could be at the basis of these class of candidate FNTDCs. Most FNTDC appear to have evolved prior of genome duplication events. More than one-third of genes interspersed/adjacent to called FNTDCs lacked any functional annotation; however, their co-localization may provide hints towards a candidate biological role.
Read full abstract