The forensic science community is poised to utilize modern advances in massively parallel sequencing (MPS) technologies to better characterize biological samples with higher resolution. A critical component towards the advancement of forensic DNA analysis with these technologies is a comprehensive understanding of the diversity and population distribution of sequence-based short tandem repeat (STR) alleles. Here we analyzed 786 samples of individuals from different population groups, including four of the mostly commonly encountered in forensic casework in the USA. DNA samples were amplified with the PowerSeq™ Auto/Y System Prototype Kit (Promega Corp.), and sequencing was performed on an Illumina® MiSeq instrument. Sequence data were analyzed using a bioinformatics processing tool, Altius. For additional data analysis and profile comparison, capillary electrophoresis (CE) size-based STR genotypes were generated for a subset of individuals, and where possible, also with a second commercially available MPS STR assay. Autosomal STR loci were analyzed and frequencies were calculated based on sequence composition. Also, population genetics studies were performed, with Hardy–Weinberg equilibrium, polymorphic information content (PIC), and observed and expected heterozygosity all assessed. Overall, sequence-based allelic variants of the repeat region were observed in 20 out of 22 different STR loci commonly used in forensic DNA genotyping, with the highest number of sequence variation observed at locus D12S391. The highest increase in allelic diversity and in PIC through sequence-based genotyping was observed at loci D3S1358 and D8S1179. Such detailed sequence analysis, as the one performed in the present study, is important to help understand the diversity of sequence-based STR alleles across different populations and to demonstrate how such allelic variation can improve statistics used for forensic casework.
Read full abstract