Positional Correlation Natural Vector: A Novel Method for Genome Comparison.

Lily He,Rui Dong,Stephen S.-T Yau,Rong Lucy He

doi:10.3390/ijms21113859

Abstract

Advances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-free methods have recently attracted attention in the field of bioinformatics. In this paper, we introduce a new positional correlation natural vector (PCNV) method that involves converting a DNA sequence into an 18-dimensional numerical feature vector. Using frequency and position correlation to represent the nucleotide distribution, it is possible to obtain a PCNV for a DNA sequence. This new numerical vector design uses six suitable features to characterize the correlation among nucleotide positions in sequences. PCNV is also very easy to compute and can be used for rapid genome comparison. To test our novel method, we performed phylogenetic analysis with several viral and bacterial genome datasets with PCNV. For comparison, an alignment-based method, Bayesian inference, and two alignment-free methods, feature frequency profile and natural vector, were performed using the same datasets. We found that the PCNV technique is fast and accurate when used for phylogenetic analysis and classification of viruses and bacteria.

Highlights

Predicting the structures, functions, and evolutionary relationships of genes is a fundamental and vital aspect of modern biological research
To demonstrate that positional correlation natural vector (PCNV) is effective, we applied it to different datasets: the genomes of hepatitis C virus (HCV), hepatitis B virus (HBV), human papillomavirus (HPV), dengue virus (DENV), and 59 bacterial species
We found that PCNV categorizes the dataset into the correct biological groups in 0.78 s (Figure 4a; Table 1); this is much faster than the feature frequency profiles (FFP) method, which takes 35 s (Table 1)

Summary

Introduction

Predicting the structures, functions, and evolutionary relationships of genes is a fundamental and vital aspect of modern biological research. A notable common feature of AF approaches is the analysis of special numerical properties of the sequences being compared. AF approaches include iterated-function systems [5], information theory [6], Fourier transformations [7], sequence representations based on chaos theory [8], and moments of the positions of the nucleotides [9,10]. The most widely used AF method is the k-mer-based method and has been published in many excellent journals [11,12,13,14,15,16,17,18,19] This method involves the analysis of the frequency of strings of specific length k within sequences [20]. Several k-mer-based methods have been developed and applied for the phylogenetic analysis of bacteria and viruses. A notable example is feature frequency profiles (FFP) [21]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International journal of molecular sciences	Publication Date: May 29, 2020
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Positional Correlation Natural Vector: A Novel Method for Genome Comparison.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of molecular sciences

Lead the way for us

Similar Papers

Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method
Guohong Albert Wu ... Gregory E Sims
Proceedings of the National Academy of Sciences | VOL. 106
Guohong Albert Wu, et. al.Guohong Albert Wu ... Gregory E Sims
04 Aug 2009
Proceedings of the National Academy of Sciences | VOL. 106

Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions
Gregory E Sims ... Sung-Hou Kim
Proceedings of the National Academy of Sciences | VOL. 106
Gregory E Sims, et. al.Gregory E Sims ... Sung-Hou Kim
24 Feb 2009
Proceedings of the National Academy of Sciences | VOL. 106

A hybrid approach for predicting transcription factors.
Sumeet Patiyal ... Gajendra P S Raghava
Frontiers in bioinformatics | VOL. 4
Sumeet Patiyal, et. al.Sumeet Patiyal ... Gajendra P S Raghava
01 Jan 2024
Frontiers in bioinformatics | VOL. 4

2011 German Escherichia coli O104:H4 outbreak: Alignment-free whole-genome phylogeny by feature frequency profiles
Man Kit Cheung ... Hoi Shan Kwan
Nature Precedings | VOL. -
Man Kit Cheung, et. al.Man Kit Cheung ... Hoi Shan Kwan
27 Jul 2011
2011 German Escherichia coli O104:H4 outbreak: Alignment-free whole-genome phylogeny by feature frequency profiles
Man Kit Cheung ... Hoi Shan Kwan

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Positional Correlation Natural Vector: A Novel Method for Genome Comparison.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of molecular sciences