Abstract

In October 2019, 46 scientists from around the world participated in the first National Center for Biotechnology Information (NCBI) Structural Variation (SV) Codeathon at Baylor College of Medicine. The charge of this first annual working session was to identify ongoing challenges around the topics of SV and graph genomes, and in response to design reliable methods to facilitate their study. Over three days, seven working groups each designed and developed new open-sourced methods to improve the bioinformatic analysis of genomic SVs represented in next-generation sequencing (NGS) data. The groups’ approaches addressed a wide range of problems in SV detection and analysis, including quality control (QC) assessments of metagenome assemblies and population-scale VCF files, de novo copy number variation (CNV) detection based on continuous long sequence reads, the representation of sequence variation using graph genomes, and the development of an SV annotation pipeline. A summary of the questions and developments that arose during the daily discussions between groups is outlined. The new methods are publicly available at https://github.com/NCBI-Codeathons/, and demonstrate that a codeathon devoted to SV analysis can produce valuable new insights both for participants and for the broader research community.

Highlights

  • Structural variants (SVs) are large-scale genomic alterations, frequently defined as greater than 50 bases in length, involving deletions, duplications, insertions, inversions and/or translocations, and can occur in combinations

  • Out of concern for the integrity of current and future metagenomic studies from short-read and long-read genome sequencing data, we found that the quality control of metagenomic assemblies could be substantially improved by using a combination of sequence alignment tools and variant call format (VCF) file outputs from Structural Variation (SV) callers such as Sniffles v1.0.832 and Manta v1.5.033

  • We developed a pipeline to identify de novo structural variants (SVs) from long-read (LR) sequencing data collected from trios

Read more

Summary

Introduction

Structural variants (SVs) are large-scale genomic alterations, frequently defined as greater than 50 bases (bp) in length, involving deletions, duplications, insertions, inversions and/or translocations, and can occur in combinations. In contrast to single nucleotide variants (SNVs), involving substitution of a single nucleotide, SVs remain understudied due to their more complex nature[1]. Our understanding of these larger forms of genomic alterations is limited by the sequencing technology and computational methods available to analyze ever-increasing amounts of sequence or similar data. A special type of SV is copy number variant (CNV). These are unbalanced SVs that could either increase or decrease total DNA content through duplications and deletions, respectively. More general studies examine the relationship between copy number variation and a range of diseases[13,14]

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call