Abstract

The use of draft genomes of different species and re-sequencing of accessions and populations are now common tools for plant biology research. The de novo assembled draft genomes make it possible to identify pivotal divergence points in the plant lineage and provide an opportunity to investigate the genomic basis and timing of biological innovations by inferring orthologs between species. Furthermore, re-sequencing facilitates the mapping and subsequent molecular characterization of causative loci for traits, such as those for plant stress tolerance and development. In both cases high-quality gene annotation—the identification of protein-coding regions, gene promoters, and 5′- and 3′-untranslated regions—is critical for investigation of gene function. Annotations are constantly improving but automated gene annotations still require manual curation and experimental validation. This is particularly important for genes with large introns, genes located in regions rich with transposable elements or repeats, large gene families, and segmentally duplicated genes. In this opinion paper, we highlight the impact of annotation quality on evolutionary analyses, genome-wide association studies, and the identification of orthologous genes in plants. Furthermore, we predict that incorporating accurate information from manual curation into databases will dramatically improve the performance of automated gene predictors.

Highlights

  • The ongoing development of next-generation sequencing techniques has led to a remarkable decrease in the cost of genome sequencing.This is reflected in the increasing number of genome assemblies from all domains of life

  • The result has been a dramatic increase in the amount and quality of information available for biological research, which often relies on gene model annotations representing exon–intron structures, regulatory elements [e.g. promoter elements, enhancers, as well as 5 ́- and 3 ́-untranslated regions (UTRs)], and locations of transposable elements (TEs) and repeat sequences

  • Errors in gene annotation have a strong impact on the results obtained, especially in phylogenomic analyses or in the functional interpretation of single-nucleotide polymorphisms detected in genome-wide association studies

Read more

Summary

Introduction

The ongoing development of next-generation sequencing techniques has led to a remarkable decrease in the cost of genome sequencing.This is reflected in the increasing number of genome assemblies from all domains of life. For members of large gene families with high sequence similarities it can be difficult to distinguish splice variants and recently diverged gene models, especially if the transcriptome data is sequenced from a different individual.This problem can be expected to become more prominent in the future as more genomes from (auto)polyploid plants become available.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call