Abstract

BackgroundThe K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison.ResultsTo meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK (http://kgcak.big.ac.cn/KGCAK/), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution.ConclusionWe hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.ReviewersThis article was reviewed by Prof Mark Ragan and Dr Yuri Wolf.

Highlights

  • Over the past few decades, phylogenies have often been built from defined evolutionarily-conserved gene families and occasionally from sequences of whole genomes

  • K-mer technique has been shown to be exceedingly effective in a variety of genomic applications, including genome assembly, motif discovery, repetitive sequence identification, and genome complexity assessment [2,3,4,5,6]

  • Genomes and gene annotations were acquired from Ensembl, Phytozome and NCBI genome databases

Read more

Summary

Introduction

Over the past few decades, phylogenies have often been built from defined evolutionarily-conserved gene families and occasionally from sequences of whole genomes. K-mer technique has been shown to be exceedingly effective in a variety of genomic applications, including genome assembly, motif discovery, repetitive sequence identification, and genome complexity assessment [2,3,4,5,6]. With the rapid accumulation of large genomic datasets in diverse species, the need for an easy-to-use database that stores and visualizes processed K-mer based data is obvious, and. Wang et al Biology Direct (2015) 10:53 genomes into easy-to-understand and visualized data from a comparative genomics perspective. The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call