What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual.

Lynsey K Whitacre,Robert D Schnabel,Juan F Medrano,Leeson J Alexander,Polyana C Tizioto,Tad S Sonstegard,Steven G Schroeder,Jaewoo Kim,Jared E Decker,Jeremy F Taylor

doi:10.1186/s12864-015-2313-7

Abstract

BackgroundNext-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain unmapped.ResultsWe generated de novo assemblies of unmapped reads from the DNA and RNA sequencing of the Bos taurus reference individual and identified the closest matching sequence to each contig by alignment to the NCBI non-redundant nucleotide database using BLAST. As expected, many of these contigs represent vertebrate sequence that is absent, incomplete, or misassembled in the UMD3.1 reference assembly. However, numerous additional contigs represent invertebrate species. Most prominent were several species of Spirurid nematodes and a blood-borne parasite, Babesia bigemina. These species are either not present in the US or are not known to infect taurine cattle and the reference animal appears to have been host to unsequenced sister species.ConclusionsWe demonstrate the importance of exploring unmapped reads to ascertain sequences that are either absent or misassembled in the reference assembly and for detecting sequences indicative of parasitic or commensal organisms.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2313-7) contains supplementary material, which is available to authorized users.

Highlights

Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly
We identified DNA and RNA contigs that were assembled de novo from unmapped reads that could generally be classified into one of three categories: 1) sequence from bovine; 2) sequence from other vertebrate species that was homologous to bovine; and 3) sequence from non-vertebrate species
Most of the contigs assembled de novo from unmapped reads that were identified as representing a non-vertebrate species were comprised of reads that originated from multiple libraries sequenced at separate facilities

Summary

Introduction

Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. Next-generation sequencing technology has vastly increased the dimensionality of sequencing projects and routinely allows the generation of hundreds of millions or even billions of short reads Analysis of these data requires that the short reads be assembled into contiguous sequences either using de novo or reference-guided assembly. For organisms with a reference genome, reads generated in the sequencing process are usually matched to the reference sequence with a variety of alignment algorithms. This is currently the most efficient way of transforming the raw sequence reads into a consensus.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2015
Citations: 51	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Exploring the unmapped DNA and RNA reads in a songbird genome
Veronika N Laine ... Marcel E Visser
BMC Genomics | VOL. 20
Veronika N Laine, et. al.Veronika N Laine ... Marcel E Visser
08 Jan 2019
BMC Genomics | VOL. 20

Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues.
Artur Gurgul ... Zbigniew Arent
Journal of applied genetics | VOL. 63
Artur Gurgul, et. al.Artur Gurgul ... Zbigniew Arent
07 Jun 2022
Journal of applied genetics | VOL. 63

The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.
Robin H Van Der Weide ... Joep De Ligt
PLOS ONE | VOL. 11
Robin H Van Der Weide, et. al.Robin H Van Der Weide ... Joep De Ligt
08 Aug 2016
PLOS ONE | VOL. 11

Abstract 4881: Detecting cancer microbiota using unmapped RNA reads on spatial transcriptomics
Jeongbin Park ... Dongjoo Lee
Cancer Research | VOL. 84
Jeongbin Park, et. al.Jeongbin Park ... Dongjoo Lee
22 Mar 2024
Abstract 4881: Detecting cancer microbiota using unmapped RNA reads on spatial transcriptomics
Jeongbin Park ... Dongjoo Lee

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics