Alignment-free method for DNA sequence clustering using Fuzzy integral similarity

Ajay Kumar Saw,Soumyadeep Nandi,Manashi Das,Garima Raj,Narayan Chandra Talukdar,Binod Chandra Tripathy

doi:10.1038/s41598-019-40452-6

Ajay Kumar Saw, Soumyadeep Nandi + Show 4 more

Open Access

https://doi.org/10.1038/s41598-019-40452-6

Copy DOI

Abstract

A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we developed an alignment-free algorithm for faster sequence analysis. The novelty of our approach is the inclusion of fuzzy integral with Markov chain for sequence analysis in the alignment-free model. The method estimate the parameters of a Markov chain by considering the frequencies of occurrence of all possible nucleotide pairs from each DNA sequence. These estimated Markov chain parameters were used to calculate similarity among all pairwise combinations of DNA sequences based on a fuzzy integral algorithm. This matrix is used as an input for the neighbor program in the PHYLIP package for phylogenetic tree construction. Our method was tested on eight benchmark datasets and on in-house generated datasets (18 s rDNA sequences from 11 arbuscular mycorrhizal fungi (AMF) and 16 s rDNA sequences of 40 bacterial isolates from plant interior). The results indicate that the fuzzy integral algorithm is an efficient and feasible alignment-free method for sequence analysis on the genomic scale.

Highlights

Phylogenetic tree analysis and comparative studies of taxa are essential parts of modern molecular biology
Examples of match length methods are, k-mismatch average common substring[32], average common substring[28], Kr – method[28], etc. These methods are commonly used for string processing in computer science
We propose to use fuzzy integral[33] to analyze DNA sequences based on a Markov chain[34], which can be categorised as k-mer or word frequency method

Summary

Introduction

Phylogenetic tree analysis and comparative studies of taxa are essential parts of modern molecular biology. Large amounts of sequence data produced by next-generation sequencing techniques have become available in private and public databases, which has created new challenges due to the limitations associated with alignment based approaches. This plethora of sequence information increases the computation and time requirements for genome comparisons in computational biology. We propose to use fuzzy integral[33] to analyze DNA sequences based on a Markov chain[34], which can be categorised as k-mer or word frequency method. The consistency can be seen from the statistical analysis such as AUC (area under the ROC) values, calculated from ROC curves provided in Supplementary Material

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Mar 6, 2019
Citations: 20	License type: open-access

R Discovery Prime

R Discovery Prime

Alignment-free method for DNA sequence clustering using Fuzzy integral similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

A glimpse into the past of land plants and of their mycorrhizal affairs: from fossils to evo‐devo
Paola Bonfante ... Marc‐André Selosse
New Phytologist | VOL. 186
Paola Bonfante, et. al.Paola Bonfante ... Marc‐André Selosse
25 Mar 2010
New Phytologist | VOL. 186

Study of fuzzy integral decision fusion algorithm
Xuehai Hu ... Houjun Wang
-
Xuehai Hu, et. al. Xuehai Hu ... Houjun Wang
01 Aug 2010
01 Aug 2010

A constructive step towards selecting a DNA barcode for fungi
Ursula Eberhardt
New Phytologist | VOL. 187
Ursula EberhardtUrsula Eberhardt
24 Jun 2010
New Phytologist | VOL. 187

Mycorrhiza for all: an under‐earth revolution
Laura B. Martinez‐Garcia ... Edith C. Hammer
New Phytologist | VOL. 198
Laura B. Martinez‐Garcia, et. al.Laura B. Martinez‐Garcia ... Edith C. Hammer
12 Apr 2013
New Phytologist | VOL. 198

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alignment-free method for DNA sequence clustering using Fuzzy integral similarity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports