Abstract
Structural variants (SVs) are short sequences of DNA, larger than one nucleotide, that can vary between members of the same species. Although SVs are relatively rare, compared to single nucleotide variants (SNVs) they are an important source of genetic variation and some SVs have been associated with diseases and susceptibility to certain types of cancer. SV detection is commonly performed by aligning sequenced fragments of an individual’s genome to a high-quality reference genome. Candidate SVs correspond to discordant mapped configurations of fragments; however, errors in the sequencing also lead to potential discordant mappings. Because of this error, many candidate SVs are in fact false positives. When sequencing coverage is high, SV detection is more accurate, but this comes at higher sequencing cost. Sequencing at low coverage does reduce cost, but increases error and complexity of SV detection. The goal of our work is to use mathematical optimization to improve SV detection in low-coverage DNA sequencing data. Previous studies of SV detection have modeled coverage with a Poisson distribution, but this assumes the mean and variance are the same. In an effort more closely model the experimental data we use the negative binomial distribution, which allows for the mean and variance to differ, and contains the Poisson distribution as a special case. Our approach also control false positive predictions by simultaneously considering simultaneous SV prediction in a parent and child. We assume that most SVs carried by a child are inherited from a parent but a small fraction may be novel to the child. We balance the rarity of novel versus inherited SVs by enforcing sparsity through an l1-penalty and compare this negative binomial reconstruction algorithm to the Poisson reconstruction algorithm by testing both on the same simulated data sets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have