Abstract

Accurate detection of somatic mutations is a crucial step toward understanding cancer. Various tools have been developed to detect somatic mutations from cancer genome sequencing data by mapping reads to a universal reference genome and inferring likelihoods from complex statistical models. However, read mapping is frequently obstructed by mismatches between germline and somatic mutations on a read and the reference genome. Previous attempts to develop personalized genome tools are not compatible with downstream statistical models for somatic mutation detection. We present PRESM, a tool that builds personalized reference genomes by integrating germline mutations into the reference genome. The aforementioned obstacle is circumvented by using a two-step germline substitution procedure, maintaining positional fidelity using an innovative workaround. Reads derived from tumor tissue can be positioned more accurately along a personalized reference than a universal reference due to the reduced genetic distance between the subject (tumor genome) and the target (the personalized genome). Application of PRESM's personalized genome reduced false-positive (FP) somatic mutation calls by as much as 55.5%, and facilitated the discovery of a novel somatic point mutation on a germline insertion in PDE1A, a phosphodiesterase associated with melanoma. Moreover, all improvements in calling accuracy were achieved without parameter optimization, as PRESM itself is parameter-free. Hence, similar increases in read mapping and decreases in the FP rate will persist when PRESM-built genomes are applied to any user-provided dataset. The software is available at https://github.com/precisionomics/PRESM. Supplementary data are available at Bioinformatics online.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call