Abstract
Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud.
Highlights
Generation Sequencing (NGS) technologies are promptly becoming the most popular high-throughput strategy for drug discovery and biomedical research in the post-genome era
We illustrated discrepancies between the two commonly-used somatic mutation detection approaches by using the GMD-derived subtraction method to identify somatic calls via contrasting genotypes of paired tumor/normal samples and comparing the results with those directly detected by somatic callers in the SMD pipeline
The pipelines cover the complete workflow from raw reads to variant calling and annotation, allowing accurate detection of germline and somatic variants in the human genome
Summary
Generation Sequencing (NGS) technologies are promptly becoming the most popular high-throughput strategy for drug discovery and biomedical research in the post-genome era. Whole Exome Sequencing (WES) is a powerful and cost-effective approach for the detection of PLOS ONE | DOI:10.1371/journal.pone.0135800. ExScalibur Suite for WES Germline and Somatic Mutation Identification table S16) to retrieve the AML datasets from The Cancer Genomics Hub (CGHub). Sample IDs are shown in the pair of "tumor, normal." The ExScalibur pipeline is available from GitHub (https://github.com/ cribioinfo). We have developed a website that hosts general information as well as instructions, tutorials and release notes of ExScalibur.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.