Abstract

High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary data are available at Bioinformatics online.

Highlights

  • ResultsWe present ngsReports, an R package designed for the management and visualization of Next Generation Sequencing (NGS) reports from within an R environment

  • The Generation Sequencing (NGS) boom of genetics has provided researchers unparalleled resources to answer fundamental questions in population and medical genetics

  • We present ngsReports, an R package designed for the management and visualization of Next Generation Sequencing (NGS) reports from within an R environment

Read more

Summary

Results

We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC output as well as that from aligners such as HISAT2, STAR and Bowtie. Visualization can be carried out across many samples using heatmaps rendered using ggplot and plotly. These can be displayed in an interactive shiny app or a HTML report. We provide methods to assess observed GC content in an organism dependent manner for both transcriptomic and genomic datasets. Hierarchical clustering can be carried out on heatmaps with large sample sizes to quickly identify outliers and batch effects. Availability and Implementation: ngsReports is available at https://github.com/UofABioinformaticsHub/ngsReports.

Introduction
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call