Abstract

Published by the Combined DNA Index System (CODIS) program of the Federal Bureau of Investigation (FBI) in 1997, the 13 core short tandem repeat (STR) loci are widely adopted as genetic markers in forensic applications, e.g., identity testing and paternity testing. However, these loci may be biased and suffer from reduced sensitivities toward specific population groups. In addition, the rapid growth of entries in forensic databases raises the chance of random hits, which can cause false recognitions of criminal suspects. A solution to these problems is to introduce more effective STR markers. The availability of whole genome sequencing enables us to identify more reliable STRmarkers for forensic applicationscomputationally. In this paper, we proposed an algorithm to identify STR markers with high discriminative abilities from the next-generation sequencing data. Our algorithm could select a customized set of loci for a given population with pre-specified discriminative thresholds. We have applied the method to 320 Chinese individuals from the 1000 Genomes Project and obtained various numbers of loci, which were able to statistically identify an individualworldwide and had higher combined powers of discrimination and combined probabilities of exclusion than the existing CODIS 13 loci. For identity testing, themean frequencyofDNAprofile (FDP) with the selected 11 STRs was smaller than that with CODIS 13 STRs by student's t-test. With more loci, much smaller FDPs were obtained. The databasematching probabilities for selected loci were also lower than that for CODIS 13 STRs in a database with 10 billion entries. Moreover, the selected loci were able to provide considerably low chance of random profile matches so that statistically no false positives could occur. The selected loci also reduced the risk of random allele matches when doing the familial search, with lower random allele matching probabilities. In addition, the selected STRs were statistically better than CODIS STRs for paternity testing in our simulated data, with lower probabilities of false inclusions and exclusions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call