Abstract

BackgroundThe assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome.ResultsWe present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome.ConclusionsSupport vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications.

Highlights

  • Human ethnic identity is a controversial and complex topic

  • We begin by briefly introducing principal component analysis (PCA), a dimensionality reduction technique used as a preprocessing step for three of the four methods

  • We describe the four classification algorithms – support vector machines (SVM), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and 1nearest neighbor (1NN)

Read more

Summary

Introduction

Human ethnic identity is a controversial and complex topic. Each human individual is a complex mosaic of genetic material originating from a multitude of ancestral sources. The use of panels of autosomal markers have been shown to provide excellent accuracy for assigning samples to specific clades [1,2] These approaches rely on typing large numbers of autosomal loci that may not survive long periods of degradation. Several studies including [3,4,5] have previously shown the feasibility of inferring the probable ethnicity and/or geographic origin from the sequence of the hypervariable region (HVR) of the mitochondrial genome. Of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. We compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.