Abstract

Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%-94% of genotyping errors depending on the pedigree topology.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call