K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features.

Aaron Sievers,Aaron Sievers,Georg Hildenbrand,Michael Hausmann,Michael Hausmann,Patrick Froß,Georg Hildenbrand,Georg Hildenbrand,Marc Bisch,Chris Dreessen,Katharina Bosiek,Katharina Bosiek,Jascha Riedel,Georg Hildenbrand,Patrick Froß,Marc Bisch,Jascha Riedel

doi:10.3390/genes8040122

Abstract

In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers) was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4) on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B) of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs), which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST) and the high pace of standard k-mer analysis.

Highlights

In recent years, k-mer-based analysis and comparison methods have become standard tools for the analysis of large DNA sequences such as chromosomes, whole genomes, or even metagenomes
The big advantage of k-mer-based methods compared to alignment-based methods such as the well-established
The result of a standard k-mer analysis is mostly only comprised of a number representing the similarity between each pair of sequences

Summary

Introduction

K-mer-based analysis and comparison methods have become standard tools for the analysis of large DNA sequences such as chromosomes, whole genomes, or even metagenomes. There are cases where they are still very unsatisfying when compared with those of other methods, for example, during the determination of the phylogenetic distance between two genomes, where the alignment of short DNA motifs such as highly conserved ribosomal RNA genes delivers very reliable results, while the results of k-mer methods are often uncertain [3]. Perhaps the main difference between the results of an alignment-based and k-mer-based method when used on the same data set (e.g., comparison of two genome sequences), is that the results of the alignment method can include the exact position (in bp) and quality of similarity of every part of the sequence within the data set. The result of a standard k-mer analysis is mostly only comprised of a number representing the similarity between each pair of sequences

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genes	Publication Date: Apr 19, 2017
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genes

Lead the way for us

Similar Papers

Physical and genetic maps of the human herpesvirus 7 strain SB genome.
G Dominguez ... N Inoue
Archives of virology | VOL. 141
G Dominguez, et. al.G Dominguez ... N Inoue
01 Dec 1996
Archives of virology | VOL. 141

Detection of HHV-8 (human herpesvirus-8) genomes in induced peripheral blood mononuclear cells (PBMCs) from US blood donors
L Qu ... D T Rowe
Vox Sanguinis | VOL. 100
L Qu, et. al.L Qu ... D T Rowe
02 Sep 2010
Vox Sanguinis | VOL. 100

Human herpesvirus 8 genomes and seroprevalence in United States blood donors
Lirong Qu ... Darrell J Triulzi
Transfusion | VOL. 50
Lirong Qu, et. al.Lirong Qu ... Darrell J Triulzi
28 Apr 2010
Transfusion | VOL. 50

Quantification of human herpesvirus 6 in healthy volunteers and patients with lymphoproliferative disorders by PCR-ELISA
Junko H Ohyashiki ... Kohtaro Yamamoto
Leukemia Research | VOL. 23
Junko H Ohyashiki, et. al.Junko H Ohyashiki ... Kohtaro Yamamoto
02 Jun 1999
Leukemia Research | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genes