Abstract

BackgroundThe comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool.ResultsWe developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage (https://www.zfmk.de/en/COGNATE) and on github (https://github.com/ZFMK/COGNATE).ConclusionThe tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible.

Highlights

  • The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution

  • The output of COGNATE can be used in manifold analyses, ranging from a descriptive characterization to an in-depth comparative analysis of gene organization across multiple genomes. This is further exemplified in the discussion. It is an essential feature of COGNATE to provide descriptive statistics and the complete primary data, since “an over-reliance on simple summary statistics [...] can obscure real biological trends and differences” ([2], p. 1191)

  • Apart from other already mentioned potential applications, COGNATE output can be used to study the variability of gene structure within a genome and to compare it with that in other genomes

Read more

Summary

Results

We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. We revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage (https://www.zfmk.de/en/ COGNATE) and on github (https://github.com/ZFMK/COGNATE)

Conclusion
Background
Results and discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call