MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data

Parameswaran Ramachandran,Theodore J Perkins,Christopher J Porter,Gareth A Palidwor

doi:10.1093/bioinformatics/btt001

Parameswaran Ramachandran, Theodore J Perkins + Show 2 more

Open Access

https://doi.org/10.1093/bioinformatics/btt001

Copy DOI

Journal: Bioinformatics	Publication Date: Jan 7, 2013
Citations: 39	License type: CC BY 3.0

Affiliation: University of Ottawa

Abstract

Motivation: Reliable estimation of the mean fragment length for next-generation short-read sequencing data is an important step in next-generation sequencing analysis pipelines, most notably because of its impact on the accuracy of the enriched regions identified by peak-calling algorithms. Although many peak-calling algorithms include a fragment-length estimation subroutine, the problem has not been adequately solved, as demonstrated by the variability of the estimates returned by different algorithms.Results: In this article, we investigate the use of strand cross-correlation to estimate mean fragment length of single-end data and show that traditional estimation approaches have mixed reliability. We observe that the mappability of different parts of the genome can introduce an artificial bias into cross-correlation computations, resulting in incorrect fragment-length estimates. We propose a new approach, called mappability-sensitive cross-correlation (MaSC), which removes this bias and allows for accurate and reliable fragment-length estimation. We analyze the computational complexity of this approach, and evaluate its performance on a test suite of NGS datasets, demonstrating its superiority to traditional cross-correlation analysis.Availability: An open-source Perl implementation of our approach is available at http://www.perkinslab.ca/Software.html.Contact: tperkins@ohri.caSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

Next-generation sequencing (NGS) technologies have revolutionized molecular biology with their unprecedented capacity for genome-wide measurement of protein–DNA interactions, chromatin state changes and transcription levels (Mardis, 2011)
For example, a DNA sample that is the result of a chromatinimmunoprecipitation experiment, in which DNA bound to a particular transcription factor (TF) is pulled down
We have demonstrated that mappability can introduce a strong bias into genome-wide cross-correlation computations of positive- and negative-strand read densities

Summary

Introduction

Next-generation sequencing (NGS) technologies have revolutionized molecular biology with their unprecedented capacity for genome-wide measurement of protein–DNA interactions, chromatin state changes and transcription levels (Mardis, 2011). NGS technologies differ in their details, most of the common platforms work by sequencing large numbers of shortDNA fragments. These fragments may originate, for example, from simple extraction of DNA from a sample of cells, selective extraction based on a chromatin-immunoprecipitation pulldown or reverse transcription of RNA into DNA. When the organism does have a canonical genome, the DNA fragment sequences are typically mapped back to the canonical genome, so that their distribution, and especially sites of enrichment, may be studied (Pepke et al, 2009). The best practical alternative offered by typical current technologies is sequencing the fragments starting from both ends. Despite having a canonical genome assembly to which one end of each fragment can be mapped, most NGS experiments lack information on the other, unsequenced end of each fragment

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data.
Sergio Arredondo-Alonso ... Anita C Schürch
Microbial Genomics | VOL. 3
Sergio Arredondo-Alonso, et. al.Sergio Arredondo-Alonso ... Anita C Schürch
18 Aug 2017
Microbial Genomics | VOL. 3

An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data.
Harry Bowles ... Alfredo Iacoangeli
Frontiers in bioinformatics | VOL. 2
Harry Bowles, et. al.Harry Bowles ... Alfredo Iacoangeli
08 Feb 2023
Frontiers in bioinformatics | VOL. 2

ESREEM: Efficient Short Reads Error Estimation Computational Model for Next-generation Genome Sequencing
Muhammad Tahir ... Muhammad Sardaraz
Current Bioinformatics | VOL. 16
Muhammad Tahir, et. al.Muhammad Tahir ... Muhammad Sardaraz
01 Feb 2021
Current Bioinformatics | VOL. 16

Improved Assembly of Metagenome-Assembled Genomes and Viruses in Tibetan Saline Lake Sediment by HiFi Metagenomic Sequencing.
Ye Tao ... Biao Li
Microbiology Spectrum | VOL. 11
Ye Tao, et. al.Ye Tao ... Biao Li
08 Dec 2022
Microbiology Spectrum | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics