Abstract

Metagenomics, the study of microbial genomes within diverse environments, is a rapidly developing field. The identification of microbial sequences within a host organism enables the study of human intestinal, respiratory, and skin microbiota, and has allowed the identification of novel viruses in diseases such as Merkel cell carcinoma. There are few publicly available tools for metagenomic high throughput sequence analysis. We present Integrated Metagenomic Sequence Analysis (IMSA), a flexible, fast, and robust computational analysis pipeline that is available for public use. IMSA takes input sequence from high throughput datasets and uses a user-defined host database to filter out host sequence. IMSA then aligns the filtered reads to a user-defined universal database to characterize exogenous reads within the host background. IMSA assigns a score to each node of the taxonomy based on read frequency, and can output this as a taxonomy report suitable for cluster analysis or as a taxonomy map (TaxMap). IMSA also outputs the specific sequence reads assigned to a taxon of interest for downstream analysis. We demonstrate the use of IMSA to detect pathogens and normal flora within sequence data from a primary human cervical cancer carrying HPV16, a primary human cutaneous squamous cell carcinoma carrying HPV 16, the CaSki cell line carrying HPV16, and the HeLa cell line carrying HPV18.

Highlights

  • Metagenomics, the study of microbial genomes within diverse environmental samples, has rapidly developed as a field since its introduction in 1998[1]

  • RNA-seq data was analyzed from 150bp paired-end reads from three samples: the HeLa cell line, which contains HPV18, a primary cervical cancer containing HPV16, and a primary periungual squamous cell carcinoma containing HPV16[23]

  • We present IMSA, a system for Integrated Metagenomic Sequence Analysis of high throughput sequence data

Read more

Summary

Introduction

Metagenomics, the study of microbial genomes within diverse environmental samples, has rapidly developed as a field since its introduction in 1998[1]. PARSES (Pipeline for Analysis of RNA-Seq Exogenous Sequences) is a system that uses BLAST+ for rapid filtering of human reads followed by MEGAN for visualization of metagenomic data[15].

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.