Abstract

BackgroundThe assembly of Next Generation Sequencing (NGS) reads remains a challenging task. This is especially true for the assembly of metagenomics data that originate from environmental samples potentially containing hundreds to thousands of unique species. The principle objective of current assembly tools is to assemble NGS reads into contiguous stretches of sequence called contigs while maximizing for both accuracy and contig length. The end goal of this process is to produce longer contigs with the major focus being on assembly only. Sequence read assembly is an aggregative process, during which read overlap relationship information is lost as reads are merged into longer sequences or contigs. The assembly graph is information rich and capable of capturing the genomic architecture of an input read data set. We have developed a novel hybrid graph in which nodes represent sequence regions at different levels of granularity. This model, utilized in the assembly and analysis pipeline Focus, presents a concise yet feature rich view of a given input data set, allowing for the extraction of biologically relevant graph structures for graph mining purposes.ResultsFocus was used to create hybrid graphs to model metagenomics data sets obtained from the gut microbiomes of five individuals with Crohn’s disease and eight healthy individuals. Repetitive and mobile genetic elements are found to be associated with hybrid graph structure. Using graph mining techniques, a comparative study of the Crohn’s disease and healthy data sets was conducted with focus on antibiotics resistance genes associated with transposase genes. Results demonstrated significant differences in the phylogenetic distribution of categories of antibiotics resistance genes in the healthy and diseased patients. Focus was also evaluated as a pure assembly tool and produced excellent results when compared against the Meta-velvet, Omega, and UD-IDBA assemblers.ConclusionsMining the hybrid graph can reveal biological phenomena captured by its structure. We demonstrate the advantages of considering assembly graphs as data-mining support in addition to their role as frameworks for assembly.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2678-2) contains supplementary material, which is available to authorized users.

Highlights

  • The assembly of Generation Sequencing (NGS) reads remains a challenging task

  • Identify and characterize the phylogenetic distribution of antibiotic resistance gene classes associated with transposase/integrase sequences in healthy individuals and individuals with Crohn’s disease

  • 2) Identify and characterize the phylogenetic distribution of antibiotic resistance gene classes associated with transposase/integrase sequences in healthy individuals and individuals with Crohn’s disease

Read more

Summary

Introduction

The assembly of Generation Sequencing (NGS) reads remains a challenging task This is especially true for the assembly of metagenomics data that originate from environmental samples potentially containing hundreds to thousands of unique species. We have developed a novel hybrid graph in which nodes represent sequence regions at different levels of granularity. This model, utilized in the assembly and analysis pipeline Focus, presents a concise yet feature rich view of a given input data set, allowing for the extraction of biologically relevant graph structures for graph mining purposes. NGS machines produce short DNA sequences called reads which are randomly sampled at a very high coverage from environmental DNA. Illumina technologies currently are capable of producing NGS reads anywhere from 125 bps to 300 bps in length with

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.