Structured Low-Rank Matrix Factorization for Haplotype Assembly

Changxiao Cai,Haris Vikalo,Sujay Sanghavi

doi:10.1109/jstsp.2016.2547860

Abstract

In matrix factorization problems, one seeks to decompose a data matrix into a product of two matrices—frequently, one captures meaningful information contained in the data, and the other specifies how this information is combined to generate the data matrix. In this paper, matrix factorization that arises in haplotype assembly, an important NP-hard problem in genomics, is studied. Haplotypes are sequences of chromosomal variations in an individual’s genome, which are of critical importance for understudying the individual’s susceptibility to various diseases. A novel formulation of haplotype assembly as the partially observed low-rank matrix factorization problem is proposed and efficiently solved via a modified gradient descent method that exploits salient structural properties of sequencing data. In particular, the observed matrix in the problem at hand contains noisy samples of the product of an informative matrix with rows having entries from a finite alphabet and a matrix with rows that are standard unit basis. Convergence of the proposed algorithm is analyzed and its performance tested on both synthetic and experimental data. The results demonstrate superior accuracy and speed of the proposed method as compared to state-of-the-art haplotype assembly techniques.

Full Text