The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

Robin H Van Der Weide,Pim Toonen,Edwin Cuppen,Roel Hermsen,Marieke Simonis,Joep De Ligt

doi:10.1371/journal.pone.0160036

Abstract

Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

Highlights

Next-generation sequencing (NGS) is used in a large variety of applications ranging from single cell analyses to complex microbial communities and complete vertebrate and plant genome analyses [1]
We aligned whole genome sequencing (WGS) data of 33 rat strains to the latest rat reference genome assembly (BN/NHsdMcWi, RGSC5.0) to identify ‘unmappable’ reads (Table 1)
By comparing the current rat reference genome with WGS data obtained from the same animal that was used for creating this reference, we found that 39% of the total unmapped reads are due to missing sequences in the reference genome

Summary

Introduction

Next-generation sequencing (NGS) is used in a large variety of applications ranging from single cell analyses to complex microbial communities and complete vertebrate and plant genome analyses [1]. NGS reads are, in general, aligned to an organism-specific reference genome as a first step in data analysis. Such reference genomes are typically derived from a single individual, animal or strain, with the exception of the human reference genome. Reads that align (map) to the reference genome are subsequently used for data analysis, while the unmapped reads are usually discarded [2,3]. Filtering out reads originating from the first source is fairly straightforward and implemented in most data processing procedures by discarding reads with low quality scores [4,5]. The second source of unmapped reads often contains sequences from exogenous species due to experimental and sampling

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Aug 8, 2016
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Exploring the unmapped DNA and RNA reads in a songbird genome
Veronika N Laine ... Marcel E Visser
BMC Genomics | VOL. 20
Veronika N Laine, et. al.Veronika N Laine ... Marcel E Visser
08 Jan 2019
BMC Genomics | VOL. 20

Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues.
Artur Gurgul ... Zbigniew Arent
Journal of applied genetics | VOL. 63
Artur Gurgul, et. al.Artur Gurgul ... Zbigniew Arent
07 Jun 2022
Journal of applied genetics | VOL. 63

What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual.
Lynsey K Whitacre ... Jeremy F Taylor
BMC Genomics | VOL. 16
Lynsey K Whitacre, et. al.Lynsey K Whitacre ... Jeremy F Taylor
01 Dec 2015
BMC Genomics | VOL. 16

Identifying micro-inversions using high-throughput sequencing reads.
Feifei He ... Yu-Hang Tang
BMC Genomics | VOL. Suppl 17 1
Feifei He, et. al.Feifei He ... Yu-Hang Tang
01 Jan 2015
BMC Genomics | VOL. Suppl 17 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE