Proper quality control of data prior to downstream analyses is fundamental to ensure integrity of results; quality control of genomic data is no exception. While many metrics of quality control of genomic data exist, the objective of the present study was to quantify the genotype and allele concordance rate between called single nucleotide polymorphism (SNP) genotypes differing in GenCall (GC) score; the GC score is a confidence measure assigned to each Illumina genotype call. This objective was achieved using Illumina beadchip genotype data from 771 cattle (12428767 genotypes in total post-editing) and 80 sheep (1557360 SNPs genotypes in total post-editing) each genotyped in duplicate. The called genotype with the lowest associated GC score was compared to the genotype called for the same SNP in the same duplicated animal sample but with a GC score of >0.90 (assumed to represent the true genotype). The mean genotype concordance rate for a GC score of <0.300, 0.300-0.549, and ≥0.550 in the cattle (sheep in parenthesis) was 0.9467 (0.9864), 0.9707 (0.9953), and 0.9994 (0.99997) respectively; the respective allele concordance rate was 0.9730 (0.9930), 0.9849 (0.9976), and 0.9997 (0.99998). Hence, concordance eroded as the GC score of the called genotype reduced, albeit the impact was not dramatic and was not very noticeable until a GC score of <0.55. Moreover, the impact was greater and more consistent in the cattle population than in the sheep population. Furthermore, an impact of GC score on genotype concordance rate existed even for the same SNP GenTrain value; the GenTrain value is a statistical score that depicts the shape of the genotype clusters and the relative distance between the called genotype clusters.
Read full abstract