Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression

Kujin Tang,Fengzhu Sun,Jie Ren

doi:10.1186/s13059-019-1872-3

Kujin Tang, Fengzhu Sun + Show 1 more

Open Access

https://doi.org/10.1186/s13059-019-1872-3

Copy DOI

Journal: Genome Biology	Publication Date: Dec 1, 2019
Citations: 17	License type: open-access

Affiliation: University of Southern California

Abstract

Alignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity calculated based on sequencing samples can be overestimated compared with the dissimilarity calculated based on their genomes, and this bias can significantly decrease the performance of the alignment-free analysis. Here, we introduce a new alignment-free tool, Alignment-Free methods Adjusted by Neural Network (Afann) that successfully adjusts this bias and achieves excellent performance on various independent datasets. Afann is freely available at https://github.com/GeniusTang/Afann.

Highlights

With the advent of next-generation sequencing (NGS) technologies, enormous amounts of sequence data are emerging rapidly
Since background-adjusted dissimilarity measures have been shown to outperform other methods for solving different problems ranging from evolutionary distance estimation [14] to virus-host interaction prediction [15], geographic location prediction [12], horizontal gene transfer detection [16], and metagenome and metatranscriptome comparison [10, 17], we focused on the bias adjustment for two background-adjusted dissimilarity measures d2s and d2∗ in this study
We evaluated the performance of Skmer [8] on the same primate dataset using kmer length K = 21 and sketch size s = 107, which is a recent alignmentfree method that corrects the formula of Mash distance based on NGS samples by estimating the sequencing depth and sequencing error rate

Summary

Introduction

With the advent of next-generation sequencing (NGS) technologies, enormous amounts of sequence data are emerging rapidly. Alignment-based approaches for sequence comparison are generally accurate and powerful, their applications are being challenged by the size of sequence data that increases at an exponential rate. Alignment-free methods, especially kmer-based approaches that use the frequencies of kmers (k-words or k-grams) for sequence comparison can be naturally adapted to shotgun NGS sequencing data without assembly [4, 5, 8,9,10,11,12]. Zielezinski et al [9] published a comprehensive comparison over 74 alignmentfree methods for 5 research applications including cis-regulatory module detection, protein sequence classification, gene tree inference, genome-based phylogeny, and reconstruction of species trees under sequence rearrangements

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

Filtered spaced-word matches: a novel approach to fast and accurate sequence comparison
Chris-Andre Leimeister
-
Chris-Andre LeimeisterChris-Andre Leimeister
21 Feb 2022
21 Feb 2022

A web server for predicting and scanning of IL-5 inducing peptides using alignment-free and alignment-based method
Leimarembi Devi Naorem ... Gajendra P.S Raghava
Computers in Biology and Medicine | VOL. 158
Leimarembi Devi Naorem, et. al.Leimarembi Devi Naorem ... Gajendra P.S Raghava
04 Apr 2023
Computers in Biology and Medicine | VOL. 158

DeepNOG: fast and accurate protein orthologous group assignment.
Roman Feldbauer ... Pier Luigi Martelli
Computer applications in the biosciences : CABIOS | VOL. 36
Roman Feldbauer, et. al.Roman Feldbauer ... Pier Luigi Martelli
26 Dec 2020
Computer applications in the biosciences : CABIOS | VOL. 36

Comparison of Methods for Biological Sequence Clustering.
Ze-Gang Wei ... Hong-Yan Gao
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 20
Ze-Gang Wei, et. al.Ze-Gang Wei ... Hong-Yan Gao
01 Sep 2023
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Afann: bias adjustment for alignment-free sequence comparison based on sequencing data using neural network regression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology