Abstract

The detection of family relationships in genetic databases is of interest in various scientific disciplines such as genetic epidemiology, population and conservation genetics, forensic science, and genealogical research. Nowadays, screening genetic databases for related individuals forms an important aspect of standard quality control procedures. Relatedness research is usually based on an allele sharing analysis of identity by state (IBS) or identity by descent (IBD) alleles. Existing IBS/IBD methods mainly aim to identify first-degree relationships (parent–offspring or full siblings) and second degree (half-siblings, avuncular, or grandparent–grandchild) pairs. Little attention has been paid to the detection of in-between first and second-degree relationships such as three-quarter siblings (3/4S) who share fewer alleles than first-degree relationships but more alleles than second-degree relationships. With the progressively increasing sample sizes used in genetic research, it becomes more likely that such relationships are present in the database under study. In this paper, we extend existing likelihood ratio (LR) methodology to accurately infer the existence of 3/4S, distinguishing them from full siblings and second-degree relatives. We use bootstrap confidence intervals to express uncertainty in the LRs. Our proposal accounts for linkage disequilibrium (LD) by using marker pruning, and we validate our methodology with a pedigree-based simulation study accounting for both LD and recombination. An empirical genome-wide array data set from the GCAT Genomes for Life cohort project is used to illustrate the method.

Highlights

  • We show that the likelihood ratio approach is useful for distinguishing three-quarter siblings from Full siblings (FS) and 2nd-degree relationships

  • likelihood ratio (LR) approach can be of great help to detect such cases

  • The LR approach developed in this paper confirmed eight 3/4S pairs previously uncovered by a log-ratio biplot (LR-kinbiplot) approach (Graffelman et al, 2019) for genome-wide SNP array data from the GCAT cohort

Read more

Summary

1234567890();,: 1234567890();,: Introduction

The sample size used in genetic studies, GWAS in particular, is progressively increasing owing to large human sequencing projects that involve genetic data from hundreds of thousands of individuals such as UK Biobank (Bycroft et al, 2018), gnomAD (Karczewski et al, 2020), TOPMed (Taliun et al, 2019), and DiscovEHR (Staples et al, 2018) among others With such large databases, it becomes increasingly likely that in-between 1st and 2nd degree, and in-between 2nd and 3rd-degree relationships are found. We develop a likelihood ratio (LR) approach that will allow us to identify three-quarter siblings (3/4S), a family relationship whose individuals share fewer alleles than 1st-degree relationships but more alleles than 2nd-degree relatives (Table 1). We end the article with a discussion of the proposed methodology

Methods and materials
S log 10
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call