Abstract

Compound Heterozygous (CH) variant identification requires distinguishing maternally from paternally derived nucleotides, a process that requires numerous computational tools. Using such tools often introduces unforeseen challenges such as installation procedures that are operating-system specific, software dependencies that must be installed, and formatting requirements for input files. To overcome these challenges, we developed Compound Heterozygous Variant Identification Pipeline (CompoundHetVIP), which uses a single Docker image to encapsulate commonly used software tools for file aggregation (BCFtools or GATK4), VCF liftover (Picard Tools), joint-genotyping (GATK4), file conversion (Plink2), phasing (SHAPEIT2, Beagle, and/or Eagle2), variant normalization (vt tools), annotation (SnpEff), relational database generation (GEMINI), and identification of CH, homozygous alternate, and de novo variants in a series of 13 steps. To begin using our tool, researchers need only install the Docker engine and download the CompoundHetVIP Docker image. The tools provided in CompoundHetVIP, subject to the limitations of the underlying software, can be applied to whole-genome, whole-exome, or targeted exome sequencing data of individual samples or trios (a child and both parents), using VCF or gVCF files as initial input. Each step of the pipeline produces an analysis-ready output file that can be further evaluated. To illustrate its use, we applied CompoundHetVIP to data from a publicly available Ashkenazim trio and identified two genes with a candidate CH variant and two genes with a candidate homozygous alternate variant after filtering based on user-set thresholds for global minor allele frequency, Combined Annotation Dependent Depletion, and Gene Damage Index. While this example uses genomic data from a healthy child, we anticipate that most researchers will use CompoundHetVIP to uncover missing heritability in human diseases and other phenotypes. CompoundHetVIP is open-source software and can be found at https://github.com/dmiller903/CompoundHetVIP; this repository also provides detailed, step-by-step examples.

Highlights

  • Compound Heterozygous (CH) variant identification requires distinguishing maternally from paternally derived nucleotides, a process that requires numerous computational tools

  • Combined Annotation Dependent Depletion (CADD) and minor allele frequency (MAF) scores must be available for all variants, variants must be in exonic regions, and variants can have either “MED” or “HIGH” putative impact

  • In the child of this trio, we identified a compound heterozygous (CH) variant in two genes (FLNB and TTN) using a MAF threshold of 0.01 and a CADD score threshold of 15

Read more

Summary

Introduction

Compound Heterozygous (CH) variant identification requires distinguishing maternally from paternally derived nucleotides, a process that requires numerous computational tools Using such tools often introduces unforeseen challenges such as installation procedures that are operating-system specific, software dependencies that must be installed, and formatting requirements for input files. We applied CompoundHetVIP to data from a publicly available Ashkenazim trio and identified two genes with a candidate CH variant and two genes with a candidate homozygous alternate variant after filtering based on user-set thresholds for global minor allele frequency, Combined Annotation Dependent Depletion, and Gene Damage Index While this example uses genomic data from a healthy child, we anticipate that most researchers will use CompoundHetVIP to uncover missing heritability in human diseases and other phenotypes. Installing some programs can be challenging because of operating-system specific installation processes and software dependencies

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.