Abstract
In genome-wide association studies, high-level statistical analyses rely on the validity of the called genotypes, and different genotype calling algorithms (GCAs) have been proposed. We compared the GCAs Bayesian robust linear modeling using Mahalanobis distance (BRLMM), Chiamo++, and JAPL using the autosomal single-nucleotide polymorphisms (SNPs) from the 500 k Affymetrix Array Set data of the Framingham Heart Study as provided for the Genetic Analysis Workshop 16, Problem 2, and prepared standard quality control (sQC) for each algorithm. Using JAPL, most individuals were retained for the analysis. The lowest number of SNPs that successfully passed sQC was observed for BRLMM and the highest for Chiamo++. All three GCAs fulfilled all sQC criteria for 79% of the SNPs but at least one GCA failed for 18% of the SNPs. Previously undetected errors in strand coding were identified by comparing genotype concordances between GCAs. Concordance dropped with the number of GCAs failing sQC. We conclude that JAPL and Chiamo++ are the GCAs of choice if the aim is to keep as many subjects and SNPs as possible, respectively.
Highlights
A crucial step in the data generation process of genomewide association studies is genotype calling
We investigated the influence of genotype-calling algorithms (GCAs) on autosomal single-nucleotide polymorphisms (SNPs) that passed the filtering by errors in strand coding and standard quality control
A total of 486,605 SNPs were provided for Genetic Analysis Workshop 16 (GAW16)
Summary
A crucial step in the data generation process of genomewide association studies is genotype calling. Qualitative genotypes are derived from measured signal intensities of the two alleles of a single-nucleotide polymorphism (SNP). Because missing or erroneous genotypes can flaw the high-level statistical association analysis, a series of different genotype-calling algorithms (GCAs) have been proposed [1]. The outcome of these GCAs can differ substantially [2]. We compared different GCAs using the genotype data from participants of the Framingham Heart Study SNP Health Association Resource project. We investigated the influence of GCAs on autosomal SNPs that passed the filtering by errors in strand coding and standard quality control (sQC)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have