Abstract
BackgroundLarge genotyping datasets have become commonplace due to efficient, cheap methods for SNP identification. Typical genotyping datasets may have thousands to millions of data points per accession, across tens to thousands of accessions. There is a need for tools to help rapidly explore such datasets, to assess characteristics such as overall differences between accessions and regional anomalies across the genome.ResultsWe present GCViT (Genotype Comparison Visualization Tool), for visualizing and exploring large genotyping datasets. GCViT can be used to identify introgressions, conserved or divergent genomic regions, pedigrees, and other features for more detailed exploration. The program can be used online or as a local instance for whole genome visualization of resequencing or SNP array data. The program performs comparisons of variants among user-selected accessions to identify allele differences and similarities between accessions and a user-selected reference, providing visualizations through histogram, heatmap, or haplotype views. The resulting analyses and images can be exported in various formats.ConclusionsGCViT provides methods for interactively visualizing SNP data on a whole genome scale, and can produce publication-ready figures. It can be used in online or local installations. GCViT enables users to confirm or identify genomics regions of interest associated with particular traits.GCViT is freely available at https://github.com/LegumeFederation/gcvit. The 1.0 version described here is available at https://doi.org/10.5281/zenodo.4008713.
Highlights
Large genotyping datasets have become commonplace due to efficient, cheap methods for SNP identification
In this paper we describe a new tool, GCViT (Genotype Comparison Visualization Tool) for dynamic, whole genome visualization of resequencing and SNP array data through histogram, heatmap or haplotype views of two or more accessions selected from a genotyping data set
Instructions for deploying an instance of GCViT are provided in the GitHub repository
Summary
Large genotyping datasets have become commonplace due to efficient, cheap methods for SNP identification. Re-sequencing and SNP-array projects are used to identify sequence variants between multiple lines, and may be used to perform genome wide association studies (GWAS) to find variants that are associated with phenotypes. These studies can produce millions of SNPs. For example, Torkamaneh et al [1]. The command line tool Genotype Query Tools (GQT) [2] and its web form, webGQT [3] provide a means of indexing and querying VCF files. Some of these tools include: Wilkey et al BMC Genomics (2020) 21:822
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.