Background/Objectives: Inferring genetic relationships based on genetic data has gained an increasing focus in the last years, in particular explained by the rise of forensic investigative genetic genealogy (FIGG) but also the introduction of expanded SNP panels in forensic genetics. A plethora of statistical methods are used throughout publications; in direct-to-consumer (DTC) testing, the shared segment approach is used, in screenings of relationships in medical genetic research, for instance, methods-of-moment estimators, e.g., estimation of the kinship coefficient, are used, and in forensic genetics, the likelihood and the likelihood ratio are commonly used to evaluate the genetic data under competing hypotheses. This current study aims to compare and contrast examples of the aforementioned statistical methods to infer relationships from genetic data. Methods/Results: This study includes some historical and some recently published panels of SNP markers to illustrate the strength and caveats of the statistical methods on different marker sets and a selection of pre-defined pairwise relationships, 1st through 7th degree. Extensive simulations are performed and subsequently subsetted based on the marker panels alluded to above. As has been shown in previous research, the likelihood ratio is most powerful, i.e., high correct classifications, when SNP data are sparse, say below 20,000 markers, whereas the windowed kinships and segment approaches are equally powerful when very dense SNP data are available, say >20,000 markers. In between lay approaches using method-of-moments estimators which perform well when the degree of relationship is below four but less so beyond, say, 4th degree relationships. The likelihood ratio is the only method that is easily adapted for non-pairwise tests and therefore has an additional depth not addressed in the current study. We furthermore perform a study of genotyping error rates and their impact on the different statistical methods employed to infer relationships, where the results show that error rates below 1% seem to have low impact across all methods, in particular for errors yielding false heterozygote genotypes.
Read full abstract