Abstract

BackgroundEvery next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope’s pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope’s pipeline versus open source tools.ResultsSufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50 % concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope’s computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.ConclusionsWe quantitatively supported a conclusion that Lifescope’s pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

Highlights

  • Every generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data

  • Mapping It is recommended by Genome Analysis Toolkit (GATK) creators that 80 % of targeted regions are covered at least by 20× in order to achieve good results by GATK

  • If compared to each other, Mapping and Assembly with Qualities (MAQ) can map about 28 % of the reads unmapped by Blat-like Fast Accurate Search Tool (BFAST) and about 19 % of the reads unmapped by Short Read Mapping Package (SHRiMP)

Read more

Summary

Introduction

Every generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope’s pipeline in combination with open source color-space competent mapping programs and a variant caller. Illumina platform is based on sequencing by synthesis and is using letter-based nucleotide encoding. SOLiD platform is employing a different, ligation based, sequencing strategy and uses color-space encoding. Two-base encoding greatly facilitates identification of sequencing errors because each base is interrogated twice by ligation chemistry. This strategy increases confidence that observed variations at specific genomic locations are true single nucleotide variants

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call