Abstract
Genetic testing has expanded out of the research laboratory into medical practice and the direct-to-consumer market. Rapid analysis of the resulting genotype data now has a significant impact. We present a method for summarizing personal genotypes as ‘genotype fingerprints’ that meets these needs. Genotype fingerprints can be derived from any single nucleotide polymorphism-based assay, and remain comparable as chip designs evolve to higher marker densities. We demonstrate that these fingerprints support distinguishing types of relationships among closely related individuals and closely related individuals from individuals from the same background population, as well as high-throughput identification of identical genotypes, individuals in known background populations, and de novo separation of subpopulations within a large cohort through extremely rapid comparisons. Although fingerprints do not preserve anonymity, they provide a useful degree of privacy by summarizing a genotype while preventing reconstruction of individual marker states. Genotype fingerprints are therefore well-suited as a format for public aggregation of genetic information to support ancestry and relatedness determination without revealing personal health risk status.
Highlights
A large number of genotypes have been produced by DNA hybridization, employing a variety of array designs [1]
These genotype fingerprints meet the characteristics of genotype data: they can be rapidly computed starting from any of several chip array designs, with genome coordinates expressed relative to any reference version, and the resulting fingerprints are directly comparable as long as the same fingerprint length L is used
We described fingerprints generated using single-nucleotide polymorphisms (SNPs) lists derived from two array designs used by 23andMe: V2, based on Illumina (Foster City, California, USA)
Summary
A large number of genotypes have been produced by DNA hybridization, employing a variety of array designs [1]. The low cost of hybridization assays relative to sequencing, including whole-genome sequencing (WGS), exome sequencing, and other forms of targeted sequencing, has led to the commoditization of array-based genotyping and enabled commercial companies (including 23andMe, AncestryDNA, Family Tree DNA, and others [2]) to offer this service directly to consumers (DTC), which typically yield results with high concordance [3] and low no-call rates [4]. Genotyping the same individual using different array designs can yield slightly different results as each technology has its own biases. Even when using the same technology, genotype reference version, and variant encoding format, genotyping the same individual repeatedly can produce slightly different results due to the stochastic nature of genome processing and analysis, batch effects, or differences in the computational pipelines. In addition to relatedness applications, array-based genotyping is used as a quality control step prior to WGS when comprehensive variant information is desired
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.