Sparse diploid spatial biosignal recovery for genomic variation detection

Mario Banuelos,Melissa Spence,Suzanne Sindi,Rubi Almanza,Katharine Sanderson,Roummel F Marcia,Jonathan Sahagun,Lasith Adhikari,Andrew Fujikawa

doi:10.1109/memea.2017.7985888

Abstract

Structural variants (SVs) - such as duplications, deletions and inversions - are rearrangements of an individual's genome relative to a given reference. The common method for detection of SVs is to sequence fragments from an individual's genome, map them to the appropriate reference and, by identifying discordant mappings, predict the locations and type of SV. However, errors in both the sequencing and mapping process will result in signals that look like SVs, resulting in inaccurate predictions. In addition, because of variation in sequencing coverage even when the evidence of an SV is present, determining if an individual has the SV present on one or both of their chromosomes is challenging. In our work, we seek to improve upon standard methods for SV detection in three ways. First, to reduce false-positive predictions, we simultaneously predict SVs in a parent and child using properties of inheritance to constrain the space of possible SVs. Second, we predict if a variant is homozygous (SV is on two chromosomes) or heterozygous (SV is on one chromosome). Third, we utilize a gradient-based optimization approach and constrain our solution with a sparsity-promoting l 1 penalty (since SV instances should be rare). We demonstrate the improved performance of our computational approach on both simulated genomes as well as a parent-child trio from the 1000 Genomes Project.

Full Text