Abstract

Genetic markers, defined as variable regions of DNA, can be utilized for distinguishing individuals or populations. As long as markers are independent, it is easy to combine the information they provide. For nonrecombinant sequences like mtDNA, choosing the right set of markers for forensic applications can be difficult and requires careful consideration. In particular, one wants to maximize the utility of the markers. Until now, this has mainly been done by hand.We propose an algorithm that finds the most informative subset of a set of markers. The algorithm uses a depth first search combined with a branch-and-bound approach. Since the worst case complexity is exponential, we also propose some data-reduction techniques and a heuristic.We implemented the algorithm and applied it to two forensic caseworks using mitochondrial DNA, which resulted in marker sets with significantly improved haplotypic diversity compared to previous suggestions. Additionally, we evaluated the quality of the estimation with an artificial dataset of mtDNA. The heuristic is shown to provide extensive speedup at little cost in accuracy.

Highlights

  • Genetic markers are ubiquitous in molecular biology and have many applications, such as forensic analysis, taxonomic barcoding, and detection of inherited diseases

  • We present algorithms for finding the most informative subset of markers, subject to constraints such as marker size and number, and estimating the markers’ haplotypic diversity from a sample set given as multiple sequence alignment (MSA)

  • Haplotypic diversity The calculation of h is based on an estimation for the diversity of a genetic marker [5]

Read more

Summary

Introduction

Genetic markers are ubiquitous in molecular biology and have many applications, such as forensic analysis, taxonomic barcoding, and detection of inherited diseases. While full-length sequences would be the preferred material for most studies, real-world circumstances sometimes force the usage of a limited set of markers as a proxy. SNPs and short tandem repeats are commonly used as markers in forensics. How to choose markers is an important question and many factors can affect such a decision. Sample availability, application, sequencing technology, cost, time, and practicality has to be taken under consideration [1,2,3,4]. High-throughput sequencing has revolutionized molecular biology and genetics, it is not yet an economically feasible route for forensic laboratories

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call