Abstract

BackgroundAccurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies. Several mutation-callers are publicly available and more are likely to appear. Nonetheless, mutation-calling is still challenging and there is unlikely to be one established caller that systematically outperforms all others. Therefore, fully utilizing multiple callers can be a powerful way to construct a list of final calls for one’s research.ResultsUsing a set of mutations from multiple callers that are impartially validated, we present a statistical approach for building a combined caller, which can be applied to combine calls in a wider dataset generated using a similar protocol. Using the mutation outputs and the validation data from The Cancer Genome Atlas endometrial study (6,746 sites), we demonstrate how to build a statistical model that predicts the probability of each call being a somatic mutation, based on the detection status of multiple callers and a few associated features.ConclusionThe approach allows us to build a combined caller across the full range of stringency levels, which outperforms all of the individual callers.

Highlights

  • Accurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies

  • With the burst of high-throughput sequencing data generated in recent years, extensive efforts have been made towards accurate somatic mutationcalling

  • In Section ‘Improving a single caller’s performance using details of its filters’, we show the potential for improving the performance of an individual caller using more detailed outputs, using Caller B as an instance

Read more

Summary

Introduction

Accurate somatic mutation-calling is essential for insightful mutation analyses in cancer studies. Several mutation-callers are publicly available and more are likely to appear. Mutation-calling is still challenging and there is unlikely to be one established caller that systematically outperforms all others. Somatic mutations are genetic changes that occur in somatic cells after conception. Cancer is driven by such somatic alterations, and cataloging somatic mutations is essential to understand the genetic bases of cancer development. Additional in-house callers are likely to be under development for on-going studies. Many challenges remain to be addressed, for example, removing artifactual variants from multiple sources, detecting rare variants in highly heterogeneous tumor samples, detecting variants at a shallower sequencing coverage.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call